* Re: SMTC support status in latest git head @ 2010-12-14 21:27 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-14 21:27 UTC (permalink / raw) To: kevink; +Cc: linux-mips Kevin, It turns out we are also looking at Linux SMTC support for 34kc. (For a different pmc part.) You said you remembered seeing it work on at least one version of the kernel. Could you help us find that version by bracketing the search a bit? Maybe a date and/or version range to look in. Regards, Stuart Venters Adtran ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head @ 2010-12-14 21:27 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-14 21:27 UTC (permalink / raw) To: kevink; +Cc: linux-mips Kevin, It turns out we are also looking at Linux SMTC support for 34kc. (For a different pmc part.) You said you remembered seeing it work on at least one version of the kernel. Could you help us find that version by bracketing the search a bit? Maybe a date and/or version range to look in. Regards, Stuart Venters Adtran ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head 2010-12-14 21:27 ` STUART VENTERS (?) @ 2010-12-14 23:01 ` Kevin D. Kissell -1 siblings, 0 replies; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-14 23:01 UTC (permalink / raw) To: STUART VENTERS; +Cc: linux-mips On 12/14/10 13:27, STUART VENTERS wrote: > Kevin, > > It turns out we are also looking at Linux SMTC support for 34kc. > (For a different pmc part.) > > You said you remembered seeing it work on at least one version of the kernel. > > Could you help us find that version by bracketing the search a bit? > > Maybe a date and/or version range to look in. > There were early working versions without dyntick or interrupt affinity in the 2.6.23/24 timeframe, but as per the commit lots in linux-mips.org, I finally got the dyntick stuff working in September 2008, with the commits propagating to various git branches over the following two months. I can see that the new code was in 2.6.28.1 but not in 2.6.26.8 At some point subsequent to that, I'm pretty sure I checked out the then-latest stable version of the Malta branch and got a functional build. The last time I regression checked it was in March of 2009 at which point some infrastructure changes had broken things, which I fixed in patches posted on March 31, 2009, one which addressed a change in the semantics of CP0 access macros, and one of which fixed a name conflict. Those were committed on 3/31 and 5/14/2009, depending on the branch you look at. With those patches and only those patches on what was then the latest stable (Malta?) branch at LMO, it seemed to run OK to the limited degree I was able to have it tested. Someone else found a hole in smtc_distribute_timer() in November of 2009, and I worked with the discoverer on a very small patch committed November 13, 2009, but I never actually ran the code to test (then again, I'd never been able to drive a system into the failure it could cause). Sorry to be a little vague, but I no longer have my MIPS Linux development build or test systems, so I'm reduced to googling and searching LMO, just like anyone else. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. @ 2010-12-16 15:37 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-16 15:37 UTC (permalink / raw) To: kevink, anoop.pa; +Cc: linux-mips, Anoop_P.A [-- Attachment #1: Type: text/plain, Size: 347 bytes --] Two other possible clues: The EVP is clear in the MVPControl register. Does this say that only VPE0, T0 gets to run? Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage Exception dispatch. But that seems to conflict the EVP bit above. Perhaps these are an artifact of getting to a good state to dump things out. [-- Attachment #2: Type: text/html, Size: 966 bytes --] ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. @ 2010-12-16 15:37 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-16 15:37 UTC (permalink / raw) To: kevink, anoop.pa; +Cc: linux-mips, Anoop_P.A [-- Attachment #1: Type: text/plain, Size: 347 bytes --] Two other possible clues: The EVP is clear in the MVPControl register. Does this say that only VPE0, T0 gets to run? Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage Exception dispatch. But that seems to conflict the EVP bit above. Perhaps these are an artifact of getting to a good state to dump things out. [-- Attachment #2: Type: text/html, Size: 966 bytes --] ^ permalink raw reply [flat|nested] 68+ messages in thread
[parent not found: <4D0A677C.6040104@paralogos.com>]
* Re: SMTC support status in latest git head. [not found] ` <4D0A677C.6040104@paralogos.com> @ 2010-12-16 19:58 ` Kevin D. Kissell 2010-12-17 21:35 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-16 19:58 UTC (permalink / raw) To: STUART VENTERS; +Cc: anoop.pa, linux-mips, Anoop_P.A Ralf tells me that this message got blocked by the LMO server due to HTML content. So here it is again, textier. On 12/16/10 11:24, Kevin D. Kissell wrote: > On 12/16/10 07:37, STUART VENTERS wrote: > > Two other possible clues: > > The EVP is clear in the MVPControl register. > Does this say that only VPE0, T0 gets to run? That's correct. In the maxtcs=1/maxvpes=1 boot state, it wouldn't matter. It's just possible that setting EVP is conditional on more than one VPE being used, but that's not the way I remember it. > Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage Exception dispatch. > But that seems to conflict the EVP bit above. I don't have a copy of the ASE spec handy to see whether those bits have a defined power-on value, but particularly if maxvpes=1 was set at boot time, I would expect VPE1's registers to be in a partly random power-up state. > Perhaps these are an artifact of getting to a good state to dump things out. As per my previous mail, I looked at the MT register dump source, and it really does pull values directly out of registers and doesn't depend on having a sane kernel stack frame. The exceptions to that rule are the reported values for TCStatus of the executing TC, which is based on the perhaps-now-broken assumption that local_irq_save(flags) stores the *entire* pre-invocation value of the TCStatus register in the flags variable, and MVPcontrol, which is based on the assumption that dvpe() returns the pre-invocation value of MVPcontrol. Break those assumptions, and you'll get inconsistent state dumps like this, and very possibly incorrect execution. Particularly if what was done was that effectively replaces the SMTC-specific implementation of local_irq_save()/local_irq_restore() with something that uses the generic MIPS32R2 atomic interrupt enable/disable instructions. That would have been a *very* bad idea... Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-16 19:58 ` Kevin D. Kissell @ 2010-12-17 21:35 ` Kevin D. Kissell 2010-12-20 10:44 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-17 21:35 UTC (permalink / raw) To: anoop.pa; +Cc: STUART VENTERS, linux-mips, Anoop_P.A So, Anoop, if you get a minute for this any time in the next day or so (after which I'll have very limited net access until next year), could you please do an <mumble>-mips<mumble>-objdump --disassemble of your kernel image (or even just the mips-mt.o module) from a failing kernel build and post the disassembly of mips_mt_regdump()? The confirmation or refutation of the theory about local_irq_save() no longer being built correctly for SMTC would be within the first few instructions... /K. On 12/16/10 11:58, Kevin D. Kissell wrote: > Ralf tells me that this message got blocked by the LMO server due to > HTML content. > So here it is again, textier. > > On 12/16/10 11:24, Kevin D. Kissell wrote: > > On 12/16/10 07:37, STUART VENTERS wrote: > > > > Two other possible clues: > > > > The EVP is clear in the MVPControl register. > > Does this say that only VPE0, T0 gets to run? > > That's correct. In the maxtcs=1/maxvpes=1 boot state, it wouldn't > matter. It's just possible that setting EVP is conditional on more > than one VPE being used, but that's not the way I remember it. > > > Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage > Exception dispatch. > > But that seems to conflict the EVP bit above. > > I don't have a copy of the ASE spec handy to see whether those bits > have a defined power-on value, but particularly if maxvpes=1 was set > at boot time, I would expect VPE1's registers to be in a partly random > power-up state. > > > Perhaps these are an artifact of getting to a good state to dump > things out. > > As per my previous mail, I looked at the MT register dump source, and > it really does pull values directly > out of registers and doesn't depend on having a sane kernel stack > frame. The exceptions to that rule > are the reported values for TCStatus of the executing TC, which is > based on the perhaps-now-broken > assumption that local_irq_save(flags) stores the *entire* > pre-invocation value of the TCStatus register > in the flags variable, and MVPcontrol, which is based on the > assumption that dvpe() returns the pre-invocation > value of MVPcontrol. Break those assumptions, and you'll get > inconsistent state dumps like this, > and very possibly incorrect execution. Particularly if what was done > was that effectively replaces > the SMTC-specific implementation of > local_irq_save()/local_irq_restore() with something that uses > the generic MIPS32R2 atomic interrupt enable/disable instructions. > That would have been a *very* bad idea... > > Regards, > > Kevin K. > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-17 21:35 ` Kevin D. Kissell @ 2010-12-20 10:44 ` Anoop P A [not found] ` <4D10F7A9.1020306@paralogos.com> 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-20 10:44 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, linux-mips, Anoop_P.A Hi Kevin, Please find disassembly for mips_mt_reg_dump Thanks Anoop Disassembly of section .text: 00000000 <mips_mt_regdump>: 0: 27bdffb8 addiu sp,sp,-72 4: 00802821 move a1,a0 8: afbf0044 sw ra,68(sp) c: afbe0040 sw s8,64(sp) 10: afb7003c sw s7,60(sp) 14: afb60038 sw s6,56(sp) 18: afb50034 sw s5,52(sp) 1c: afb40030 sw s4,48(sp) 20: afb3002c sw s3,44(sp) 24: afb20028 sw s2,40(sp) 28: afb10024 sw s1,36(sp) 2c: afb00020 sw s0,32(sp) 30: 40141001 mfc0 s4,c0_tcstatus 34: 36810400 ori at,s4,0x400 38: 40811001 mtc0 at,c0_tcstatus 3c: 32940400 andi s4,s4,0x400 40: 000000c0 ehb 44: 41610001 dvpe at 48: 0020a821 move s5,at 4c: 000000c0 ehb 50: 3c020000 lui v0,0x0 54: 24420060 addiu v0,v0,96 58: 00400408 jr.hb v0 5c: 00000000 nop 60: 3c040000 lui a0,0x0 64: 24840000 addiu a0,a0,0 68: 0c000000 jal 0 <mips_mt_regdump> 6c: afa50010 sw a1,16(sp) 70: 3c040000 lui a0,0x0 74: 0c000000 jal 0 <mips_mt_regdump> 78: 24840000 addiu a0,a0,0 7c: 8fa50010 lw a1,16(sp) 80: 3c040000 lui a0,0x0 84: 0c000000 jal 0 <mips_mt_regdump> 88: 24840000 addiu a0,a0,0 8c: 3c040000 lui a0,0x0 90: 24840000 addiu a0,a0,0 94: 0c000000 jal 0 <mips_mt_regdump> 98: 02a02821 move a1,s5 9c: 40110002 mfc0 s1,c0_mvpconf0 a0: 3c040000 lui a0,0x0 a4: 02202821 move a1,s1 a8: 0c000000 jal 0 <mips_mt_regdump> ac: 24840000 addiu a0,a0,0 b0: 3c040000 lui a0,0x0 b4: 0c000000 jal 0 <mips_mt_regdump> b8: 24840000 addiu a0,a0,0 bc: 7e331a80 ext s3,s1,0xa,0x4 c0: 3c090000 lui t1,0x0 c4: 323100ff andi s1,s1,0xff c8: 3c080000 lui t0,0x0 cc: 3c030000 lui v1,0x0 d0: 3c1e0000 lui s8,0x0 d4: 3c170000 lui s7,0x0 d8: 3c160000 lui s6,0x0 dc: 3c0a0000 lui t2,0x0 e0: 26730001 addiu s3,s3,1 e4: 26310001 addiu s1,s1,1 e8: 00008021 move s0,zero ec: 2412ff00 li s2,-256 f0: 25290000 addiu t1,t1,0 f4: 25080000 addiu t0,t0,0 f8: 24630000 addiu v1,v1,0 fc: 27de0000 addiu s8,s8,0 100: 26f70000 addiu s7,s7,0 104: 26d60000 addiu s6,s6,0 108: 254a0000 addiu t2,t2,0 10c: 00001021 move v0,zero 110: 40040801 mfc0 a0,c0_vpecontrol 114: 00922024 and a0,a0,s2 118: 00442025 or a0,v0,a0 11c: 40840801 mtc0 a0,c0_vpecontrol 120: 000000c0 ehb 124: 41020802 mftc0 at,c0_tcbind 128: 00202021 move a0,at 12c: 24420001 addiu v0,v0,1 130: 3084000f andi a0,a0,0xf 134: 12040031 beq s0,a0,1fc <mips_mt_regdump+0x1fc> 138: 0051282a slt a1,v0,s1 13c: 14a0fff4 bnez a1,110 <mips_mt_regdump+0x110> 140: 00000000 nop 144: 26100001 addiu s0,s0,1 148: 0213102a slt v0,s0,s3 14c: 1440fff0 bnez v0,110 <mips_mt_regdump+0x110> 150: 00001021 move v0,zero 154: 3c040000 lui a0,0x0 158: 24840000 addiu a0,a0,0 15c: 3c1e0000 lui s8,0x0 160: 3c170000 lui s7,0x0 164: 3c160000 lui s6,0x0 168: 3c130000 lui s3,0x0 16c: 0c000000 jal 0 <mips_mt_regdump> 170: 3c120000 lui s2,0x0 174: 00008021 move s0,zero 178: 27de0000 addiu s8,s8,0 17c: 26f70000 addiu s7,s7,0 180: 26d60000 addiu s6,s6,0 184: 26730000 addiu s3,s3,0 188: 26520000 addiu s2,s2,0 18c: 40020801 mfc0 v0,c0_vpecontrol 190: 2403ff00 li v1,-256 194: 00431024 and v0,v0,v1 198: 02021025 or v0,s0,v0 19c: 40820801 mtc0 v0,c0_vpecontrol 1a0: 000000c0 ehb 1a4: 41020802 mftc0 at,c0_tcbind 1a8: 00201821 move v1,at 1ac: 40021002 mfc0 v0,c0_tcbind 1b0: 1062003f beq v1,v0,2b0 <mips_mt_regdump+0x2b0> 1b4: 00000000 nop 1b8: 41020804 mftc0 at,c0_tchalt 1bc: 00201821 move v1,at 1c0: 24020001 li v0,1 1c4: 00400821 move at,v0 1c8: 41811004 mttc0 at,c0_tchalt 1cc: 41020801 mftc0 at,c0_tcstatus 1d0: 00203021 move a2,at 1d4: 3c040000 lui a0,0x0 1d8: 02002821 move a1,s0 1dc: 24840000 addiu a0,a0,0 1e0: afa3001c sw v1,28(sp) 1e4: 0c000000 jal 0 <mips_mt_regdump> 1e8: afa60010 sw a2,16(sp) 1ec: 8fa60010 lw a2,16(sp) 1f0: 8fa3001c lw v1,28(sp) 1f4: 080000b2 j 2c8 <mips_mt_regdump+0x2c8> 1f8: 00c02821 move a1,a2 1fc: 01202021 move a0,t1 200: 02002821 move a1,s0 204: afa3001c sw v1,28(sp) 208: afa80014 sw t0,20(sp) 20c: afa90010 sw t1,16(sp) 210: 0c000000 jal 0 <mips_mt_regdump> 214: afaa0018 sw t2,24(sp) 218: 41010801 mftc0 at,c0_vpecontrol 21c: 00202821 move a1,at 220: 8fa80014 lw t0,20(sp) 224: 0c000000 jal 0 <mips_mt_regdump> 228: 01002021 move a0,t0 22c: 41010802 mftc0 at,c0_vpeconf0 230: 00202821 move a1,at 234: 8fa3001c lw v1,28(sp) 238: 0c000000 jal 0 <mips_mt_regdump> 23c: 00602021 move a0,v1 240: 410c0800 mftc0 at,c0_status 244: 00203021 move a2,at 248: 03c02021 move a0,s8 24c: 0c000000 jal 0 <mips_mt_regdump> 250: 02002821 move a1,s0 254: 410e0800 mftc0 at,c0_epc 258: 00203021 move a2,at 25c: 410e0800 mftc0 at,c0_epc 260: 00203821 move a3,at 264: 02e02021 move a0,s7 268: 0c000000 jal 0 <mips_mt_regdump> 26c: 02002821 move a1,s0 270: 410d0800 mftc0 at,c0_cause 274: 00203021 move a2,at 278: 02c02021 move a0,s6 27c: 0c000000 jal 0 <mips_mt_regdump> 280: 02002821 move a1,s0 284: 41100807 mftc0 at,$16,7 288: 00203021 move a2,at 28c: 8faa0018 lw t2,24(sp) 290: 02002821 move a1,s0 294: 0c000000 jal 0 <mips_mt_regdump> 298: 01402021 move a0,t2 29c: 8fa3001c lw v1,28(sp) 2a0: 8fa80014 lw t0,20(sp) 2a4: 8fa90010 lw t1,16(sp) 2a8: 08000051 j 144 <mips_mt_regdump+0x144> 2ac: 8faa0018 lw t2,24(sp) 2b0: 3c040000 lui a0,0x0 2b4: 02002821 move a1,s0 2b8: 0c000000 jal 0 <mips_mt_regdump> 2bc: 24840000 addiu a0,a0,0 2c0: 00001821 move v1,zero 2c4: 02802821 move a1,s4 2c8: 03c02021 move a0,s8 2cc: 0c000000 jal 0 <mips_mt_regdump> 2d0: afa3001c sw v1,28(sp) 2d4: 41020802 mftc0 at,c0_tcbind 2d8: 00202821 move a1,at 2dc: 0c000000 jal 0 <mips_mt_regdump> 2e0: 02e02021 move a0,s7 2e4: 41020803 mftc0 at,c0_tcrestart 2e8: 00202821 move a1,at 2ec: 41020803 mftc0 at,c0_tcrestart 2f0: 00203021 move a2,at 2f4: 0c000000 jal 0 <mips_mt_regdump> 2f8: 02c02021 move a0,s6 2fc: 8fa3001c lw v1,28(sp) 300: 02602021 move a0,s3 304: 0c000000 jal 0 <mips_mt_regdump> 308: 00602821 move a1,v1 30c: 41020805 mftc0 at,c0_tccontext 310: 00202821 move a1,at 314: 0c000000 jal 0 <mips_mt_regdump> 318: 02402021 move a0,s2 31c: 8fa3001c lw v1,28(sp) 320: 14600003 bnez v1,330 <mips_mt_regdump+0x330> 324: 00001021 move v0,zero 328: 00400821 move at,v0 32c: 41811004 mttc0 at,c0_tchalt 330: 26100001 addiu s0,s0,1 334: 0211102a slt v0,s0,s1 338: 1440ff94 bnez v0,18c <mips_mt_regdump+0x18c> 33c: 00000000 nop 340: 0c000000 jal 0 <mips_mt_regdump> 344: 32b50001 andi s5,s5,0x1 348: 3c040000 lui a0,0x0 34c: 0c000000 jal 0 <mips_mt_regdump> 350: 24840000 addiu a0,a0,0 354: 12a00004 beqz s5,368 <mips_mt_regdump+0x368> 358: 32820400 andi v0,s4,0x400 35c: 41600021 evpe 360: 000000c0 ehb 364: 32820400 andi v0,s4,0x400 368: 14400003 bnez v0,378 <mips_mt_regdump+0x378> 36c: 00000000 nop 370: 0c000000 jal 0 <mips_mt_regdump> 374: 00000000 nop 378: 40011001 mfc0 at,c0_tcstatus 37c: 32940400 andi s4,s4,0x400 380: 34210400 ori at,at,0x400 384: 38210400 xori at,at,0x400 388: 0281a025 or s4,s4,at 38c: 40941001 mtc0 s4,c0_tcstatus 390: 000000c0 ehb 394: 8fbf0044 lw ra,68(sp) 398: 8fbe0040 lw s8,64(sp) 39c: 8fb7003c lw s7,60(sp) 3a0: 8fb60038 lw s6,56(sp) 3a4: 8fb50034 lw s5,52(sp) 3a8: 8fb40030 lw s4,48(sp) 3ac: 8fb3002c lw s3,44(sp) 3b0: 8fb20028 lw s2,40(sp) 3b4: 8fb10024 lw s1,36(sp) 3b8: 8fb00020 lw s0,32(sp) 3bc: 03e00008 jr ra 3c0: 27bd0048 addiu sp,sp,72 On Sat, Dec 18, 2010 at 3:05 AM, Kevin D. Kissell <kevink@paralogos.com> wrote: > So, Anoop, if you get a minute for this any time in the next day or so > (after which I'll have very limited net access until next year), could you > please do an <mumble>-mips<mumble>-objdump --disassemble of your kernel > image (or even just the mips-mt.o module) from a failing kernel build and > post the disassembly of mips_mt_regdump()? The confirmation or refutation > of the theory about local_irq_save() no longer being built correctly for > SMTC would be within the first few instructions... > > /K. > > > On 12/16/10 11:58, Kevin D. Kissell wrote: >> >> Ralf tells me that this message got blocked by the LMO server due to HTML >> content. >> So here it is again, textier. >> >> On 12/16/10 11:24, Kevin D. Kissell wrote: >> > On 12/16/10 07:37, STUART VENTERS wrote: >> > >> > Two other possible clues: >> > >> > The EVP is clear in the MVPControl register. >> > Does this say that only VPE0, T0 gets to run? >> >> That's correct. In the maxtcs=1/maxvpes=1 boot state, it wouldn't matter. >> It's just possible that setting EVP is conditional on more than one VPE >> being used, but that's not the way I remember it. >> >> > Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage >> > Exception dispatch. >> > But that seems to conflict the EVP bit above. >> >> I don't have a copy of the ASE spec handy to see whether those bits have a >> defined power-on value, but particularly if maxvpes=1 was set at boot time, >> I would expect VPE1's registers to be in a partly random power-up state. >> >> > Perhaps these are an artifact of getting to a good state to dump things >> > out. >> >> As per my previous mail, I looked at the MT register dump source, and it >> really does pull values directly >> out of registers and doesn't depend on having a sane kernel stack frame. >> The exceptions to that rule >> are the reported values for TCStatus of the executing TC, which is based >> on the perhaps-now-broken >> assumption that local_irq_save(flags) stores the *entire* pre-invocation >> value of the TCStatus register >> in the flags variable, and MVPcontrol, which is based on the assumption >> that dvpe() returns the pre-invocation >> value of MVPcontrol. Break those assumptions, and you'll get inconsistent >> state dumps like this, >> and very possibly incorrect execution. Particularly if what was done was >> that effectively replaces >> the SMTC-specific implementation of local_irq_save()/local_irq_restore() >> with something that uses >> the generic MIPS32R2 atomic interrupt enable/disable instructions. That >> would have been a *very* bad idea... >> >> Regards, >> >> Kevin K. >> >> > > ^ permalink raw reply [flat|nested] 68+ messages in thread
[parent not found: <4D10F7A9.1020306@paralogos.com>]
* RE: SMTC support status in latest git head. @ 2010-12-21 20:06 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-21 20:06 UTC (permalink / raw) To: Kevin D. Kissell, Anoop P A; +Cc: STUART VENTERS, linux-mips OK. I will check it. BTW following patch is responsible for irq change. http://git.linux-mips.org/?p=linux.git;a=commitdiff;h=df9ee29270c11dba7d0fe0b83ce47a4d8e8d2101 Thanks Anoop ________________________________________ From: Kevin D. Kissell [mailto:kevink@paralogos.com] Sent: Wednesday, December 22, 2010 12:23 AM To: Anoop P A Cc: STUART VENTERS; linux-mips@linux-mips.org; Anoop P.A. Subject: Re: SMTC support status in latest git head. OK, I see why the MT register dump isn't giving us useful information. It's not clear that it's at the root of your functional problems, though. Apparently, somebody decided that it was unwholesome to propagate anything other than the previous interrupt enable state in the flags variable passed between irq_save() and irq_restore(). I agree philosophically, but it does break the MT register dump function. And I'm quite sure that there were other bits of SMTC code that knew that it was a TCStatus value, at least in the earliest versions of the code. I'm not a gitweb power user, but I haven't been able to figure out how to determine when the "andi \\result 0x400" on or about line 138 of irqflags.h (at least that's where it is in the head of tree) was checked-in. If it's at the boundary between working and non-working versions for SMTC, it might be the cause of the problems, but it may well not be responsible for anything other than the problem with reporting the value in the MT register dump - which really ought to be fixed. I'm in a small village in France for the holidays with no git/build system at my disposal, but I think that if you were to tweak mips-mt.c at line 103 to change the tcstatval = flags; /* And pre-dump TCStatus is flags */ to something more like /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable */ tcstatval = (read_c0_tcstatus() & ~0x400) | flags; should fix the dump. Regards, Kevin K. On 12/20/10 2:44 AM, Anoop P A wrote: Hi Kevin, Please find disassembly for mips_mt_reg_dump Thanks Anoop Disassembly of section .text: 00000000 <mips_mt_regdump>: 0: 27bdffb8 addiu sp,sp,-72 4: 00802821 move a1,a0 8: afbf0044 sw ra,68(sp) c: afbe0040 sw s8,64(sp) 10: afb7003c sw s7,60(sp) 14: afb60038 sw s6,56(sp) 18: afb50034 sw s5,52(sp) 1c: afb40030 sw s4,48(sp) 20: afb3002c sw s3,44(sp) 24: afb20028 sw s2,40(sp) 28: afb10024 sw s1,36(sp) 2c: afb00020 sw s0,32(sp) 30: 40141001 mfc0 s4,c0_tcstatus 34: 36810400 ori at,s4,0x400 38: 40811001 mtc0 at,c0_tcstatus 3c: 32940400 andi s4,s4,0x400 40: 000000c0 ehb 44: 41610001 dvpe at 48: 0020a821 move s5,at 4c: 000000c0 ehb 50: 3c020000 lui v0,0x0 54: 24420060 addiu v0,v0,96 58: 00400408 jr.hb v0 5c: 00000000 nop 60: 3c040000 lui a0,0x0 64: 24840000 addiu a0,a0,0 68: 0c000000 jal 0 <mips_mt_regdump> 6c: afa50010 sw a1,16(sp) 70: 3c040000 lui a0,0x0 74: 0c000000 jal 0 <mips_mt_regdump> 78: 24840000 addiu a0,a0,0 7c: 8fa50010 lw a1,16(sp) 80: 3c040000 lui a0,0x0 84: 0c000000 jal 0 <mips_mt_regdump> 88: 24840000 addiu a0,a0,0 8c: 3c040000 lui a0,0x0 90: 24840000 addiu a0,a0,0 94: 0c000000 jal 0 <mips_mt_regdump> 98: 02a02821 move a1,s5 9c: 40110002 mfc0 s1,c0_mvpconf0 a0: 3c040000 lui a0,0x0 a4: 02202821 move a1,s1 a8: 0c000000 jal 0 <mips_mt_regdump> ac: 24840000 addiu a0,a0,0 b0: 3c040000 lui a0,0x0 b4: 0c000000 jal 0 <mips_mt_regdump> b8: 24840000 addiu a0,a0,0 bc: 7e331a80 ext s3,s1,0xa,0x4 c0: 3c090000 lui t1,0x0 c4: 323100ff andi s1,s1,0xff c8: 3c080000 lui t0,0x0 cc: 3c030000 lui v1,0x0 d0: 3c1e0000 lui s8,0x0 d4: 3c170000 lui s7,0x0 d8: 3c160000 lui s6,0x0 dc: 3c0a0000 lui t2,0x0 e0: 26730001 addiu s3,s3,1 e4: 26310001 addiu s1,s1,1 e8: 00008021 move s0,zero ec: 2412ff00 li s2,-256 f0: 25290000 addiu t1,t1,0 f4: 25080000 addiu t0,t0,0 f8: 24630000 addiu v1,v1,0 fc: 27de0000 addiu s8,s8,0 100: 26f70000 addiu s7,s7,0 104: 26d60000 addiu s6,s6,0 108: 254a0000 addiu t2,t2,0 10c: 00001021 move v0,zero 110: 40040801 mfc0 a0,c0_vpecontrol 114: 00922024 and a0,a0,s2 118: 00442025 or a0,v0,a0 11c: 40840801 mtc0 a0,c0_vpecontrol 120: 000000c0 ehb 124: 41020802 mftc0 at,c0_tcbind 128: 00202021 move a0,at 12c: 24420001 addiu v0,v0,1 130: 3084000f andi a0,a0,0xf 134: 12040031 beq s0,a0,1fc <mips_mt_regdump+0x1fc> 138: 0051282a slt a1,v0,s1 13c: 14a0fff4 bnez a1,110 <mips_mt_regdump+0x110> 140: 00000000 nop 144: 26100001 addiu s0,s0,1 148: 0213102a slt v0,s0,s3 14c: 1440fff0 bnez v0,110 <mips_mt_regdump+0x110> 150: 00001021 move v0,zero 154: 3c040000 lui a0,0x0 158: 24840000 addiu a0,a0,0 15c: 3c1e0000 lui s8,0x0 160: 3c170000 lui s7,0x0 164: 3c160000 lui s6,0x0 168: 3c130000 lui s3,0x0 16c: 0c000000 jal 0 <mips_mt_regdump> 170: 3c120000 lui s2,0x0 174: 00008021 move s0,zero 178: 27de0000 addiu s8,s8,0 17c: 26f70000 addiu s7,s7,0 180: 26d60000 addiu s6,s6,0 184: 26730000 addiu s3,s3,0 188: 26520000 addiu s2,s2,0 18c: 40020801 mfc0 v0,c0_vpecontrol 190: 2403ff00 li v1,-256 194: 00431024 and v0,v0,v1 198: 02021025 or v0,s0,v0 19c: 40820801 mtc0 v0,c0_vpecontrol 1a0: 000000c0 ehb 1a4: 41020802 mftc0 at,c0_tcbind 1a8: 00201821 move v1,at 1ac: 40021002 mfc0 v0,c0_tcbind 1b0: 1062003f beq v1,v0,2b0 <mips_mt_regdump+0x2b0> 1b4: 00000000 nop 1b8: 41020804 mftc0 at,c0_tchalt 1bc: 00201821 move v1,at 1c0: 24020001 li v0,1 1c4: 00400821 move at,v0 1c8: 41811004 mttc0 at,c0_tchalt 1cc: 41020801 mftc0 at,c0_tcstatus 1d0: 00203021 move a2,at 1d4: 3c040000 lui a0,0x0 1d8: 02002821 move a1,s0 1dc: 24840000 addiu a0,a0,0 1e0: afa3001c sw v1,28(sp) 1e4: 0c000000 jal 0 <mips_mt_regdump> 1e8: afa60010 sw a2,16(sp) 1ec: 8fa60010 lw a2,16(sp) 1f0: 8fa3001c lw v1,28(sp) 1f4: 080000b2 j 2c8 <mips_mt_regdump+0x2c8> 1f8: 00c02821 move a1,a2 1fc: 01202021 move a0,t1 200: 02002821 move a1,s0 204: afa3001c sw v1,28(sp) 208: afa80014 sw t0,20(sp) 20c: afa90010 sw t1,16(sp) 210: 0c000000 jal 0 <mips_mt_regdump> 214: afaa0018 sw t2,24(sp) 218: 41010801 mftc0 at,c0_vpecontrol 21c: 00202821 move a1,at 220: 8fa80014 lw t0,20(sp) 224: 0c000000 jal 0 <mips_mt_regdump> 228: 01002021 move a0,t0 22c: 41010802 mftc0 at,c0_vpeconf0 230: 00202821 move a1,at 234: 8fa3001c lw v1,28(sp) 238: 0c000000 jal 0 <mips_mt_regdump> 23c: 00602021 move a0,v1 240: 410c0800 mftc0 at,c0_status 244: 00203021 move a2,at 248: 03c02021 move a0,s8 24c: 0c000000 jal 0 <mips_mt_regdump> 250: 02002821 move a1,s0 254: 410e0800 mftc0 at,c0_epc 258: 00203021 move a2,at 25c: 410e0800 mftc0 at,c0_epc 260: 00203821 move a3,at 264: 02e02021 move a0,s7 268: 0c000000 jal 0 <mips_mt_regdump> 26c: 02002821 move a1,s0 270: 410d0800 mftc0 at,c0_cause 274: 00203021 move a2,at 278: 02c02021 move a0,s6 27c: 0c000000 jal 0 <mips_mt_regdump> 280: 02002821 move a1,s0 284: 41100807 mftc0 at,$16,7 288: 00203021 move a2,at 28c: 8faa0018 lw t2,24(sp) 290: 02002821 move a1,s0 294: 0c000000 jal 0 <mips_mt_regdump> 298: 01402021 move a0,t2 29c: 8fa3001c lw v1,28(sp) 2a0: 8fa80014 lw t0,20(sp) 2a4: 8fa90010 lw t1,16(sp) 2a8: 08000051 j 144 <mips_mt_regdump+0x144> 2ac: 8faa0018 lw t2,24(sp) 2b0: 3c040000 lui a0,0x0 2b4: 02002821 move a1,s0 2b8: 0c000000 jal 0 <mips_mt_regdump> 2bc: 24840000 addiu a0,a0,0 2c0: 00001821 move v1,zero 2c4: 02802821 move a1,s4 2c8: 03c02021 move a0,s8 2cc: 0c000000 jal 0 <mips_mt_regdump> 2d0: afa3001c sw v1,28(sp) 2d4: 41020802 mftc0 at,c0_tcbind 2d8: 00202821 move a1,at 2dc: 0c000000 jal 0 <mips_mt_regdump> 2e0: 02e02021 move a0,s7 2e4: 41020803 mftc0 at,c0_tcrestart 2e8: 00202821 move a1,at 2ec: 41020803 mftc0 at,c0_tcrestart 2f0: 00203021 move a2,at 2f4: 0c000000 jal 0 <mips_mt_regdump> 2f8: 02c02021 move a0,s6 2fc: 8fa3001c lw v1,28(sp) 300: 02602021 move a0,s3 304: 0c000000 jal 0 <mips_mt_regdump> 308: 00602821 move a1,v1 30c: 41020805 mftc0 at,c0_tccontext 310: 00202821 move a1,at 314: 0c000000 jal 0 <mips_mt_regdump> 318: 02402021 move a0,s2 31c: 8fa3001c lw v1,28(sp) 320: 14600003 bnez v1,330 <mips_mt_regdump+0x330> 324: 00001021 move v0,zero 328: 00400821 move at,v0 32c: 41811004 mttc0 at,c0_tchalt 330: 26100001 addiu s0,s0,1 334: 0211102a slt v0,s0,s1 338: 1440ff94 bnez v0,18c <mips_mt_regdump+0x18c> 33c: 00000000 nop 340: 0c000000 jal 0 <mips_mt_regdump> 344: 32b50001 andi s5,s5,0x1 348: 3c040000 lui a0,0x0 34c: 0c000000 jal 0 <mips_mt_regdump> 350: 24840000 addiu a0,a0,0 354: 12a00004 beqz s5,368 <mips_mt_regdump+0x368> 358: 32820400 andi v0,s4,0x400 35c: 41600021 evpe 360: 000000c0 ehb 364: 32820400 andi v0,s4,0x400 368: 14400003 bnez v0,378 <mips_mt_regdump+0x378> 36c: 00000000 nop 370: 0c000000 jal 0 <mips_mt_regdump> 374: 00000000 nop 378: 40011001 mfc0 at,c0_tcstatus 37c: 32940400 andi s4,s4,0x400 380: 34210400 ori at,at,0x400 384: 38210400 xori at,at,0x400 388: 0281a025 or s4,s4,at 38c: 40941001 mtc0 s4,c0_tcstatus 390: 000000c0 ehb 394: 8fbf0044 lw ra,68(sp) 398: 8fbe0040 lw s8,64(sp) 39c: 8fb7003c lw s7,60(sp) 3a0: 8fb60038 lw s6,56(sp) 3a4: 8fb50034 lw s5,52(sp) 3a8: 8fb40030 lw s4,48(sp) 3ac: 8fb3002c lw s3,44(sp) 3b0: 8fb20028 lw s2,40(sp) 3b4: 8fb10024 lw s1,36(sp) 3b8: 8fb00020 lw s0,32(sp) 3bc: 03e00008 jr ra 3c0: 27bd0048 addiu sp,sp,72 On Sat, Dec 18, 2010 at 3:05 AM, Kevin D. Kissell <kevink@paralogos.com> wrote: So, Anoop, if you get a minute for this any time in the next day or so (after which I'll have very limited net access until next year), could you please do an <mumble>-mips<mumble>-objdump --disassemble of your kernel image (or even just the mips-mt.o module) from a failing kernel build and post the disassembly of mips_mt_regdump()? The confirmation or refutation of the theory about local_irq_save() no longer being built correctly for SMTC would be within the first few instructions... /K. On 12/16/10 11:58, Kevin D. Kissell wrote: Ralf tells me that this message got blocked by the LMO server due to HTML content. So here it is again, textier. On 12/16/10 11:24, Kevin D. Kissell wrote: On 12/16/10 07:37, STUART VENTERS wrote: Two other possible clues: The EVP is clear in the MVPControl register. Does this say that only VPE0, T0 gets to run? That's correct. In the maxtcs=1/maxvpes=1 boot state, it wouldn't matter. It's just possible that setting EVP is conditional on more than one VPE being used, but that's not the way I remember it. Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage Exception dispatch. But that seems to conflict the EVP bit above. I don't have a copy of the ASE spec handy to see whether those bits have a defined power-on value, but particularly if maxvpes=1 was set at boot time, I would expect VPE1's registers to be in a partly random power-up state. Perhaps these are an artifact of getting to a good state to dump things out. As per my previous mail, I looked at the MT register dump source, and it really does pull values directly out of registers and doesn't depend on having a sane kernel stack frame. The exceptions to that rule are the reported values for TCStatus of the executing TC, which is based on the perhaps-now-broken assumption that local_irq_save(flags) stores the *entire* pre-invocation value of the TCStatus register in the flags variable, and MVPcontrol, which is based on the assumption that dvpe() returns the pre-invocation value of MVPcontrol. Break those assumptions, and you'll get inconsistent state dumps like this, and very possibly incorrect execution. Particularly if what was done was that effectively replaces the SMTC-specific implementation of local_irq_save()/local_irq_restore() with something that uses the generic MIPS32R2 atomic interrupt enable/disable instructions. That would have been a *very* bad idea... Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-21 20:06 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-21 20:06 UTC (permalink / raw) To: Kevin D. Kissell, Anoop P A; +Cc: STUART VENTERS, linux-mips OK. I will check it. BTW following patch is responsible for irq change. http://git.linux-mips.org/?p=linux.git;a=commitdiff;h=df9ee29270c11dba7d0fe0b83ce47a4d8e8d2101 Thanks Anoop ________________________________________ From: Kevin D. Kissell [mailto:kevink@paralogos.com] Sent: Wednesday, December 22, 2010 12:23 AM To: Anoop P A Cc: STUART VENTERS; linux-mips@linux-mips.org; Anoop P.A. Subject: Re: SMTC support status in latest git head. OK, I see why the MT register dump isn't giving us useful information. It's not clear that it's at the root of your functional problems, though. Apparently, somebody decided that it was unwholesome to propagate anything other than the previous interrupt enable state in the flags variable passed between irq_save() and irq_restore(). I agree philosophically, but it does break the MT register dump function. And I'm quite sure that there were other bits of SMTC code that knew that it was a TCStatus value, at least in the earliest versions of the code. I'm not a gitweb power user, but I haven't been able to figure out how to determine when the "andi \\result 0x400" on or about line 138 of irqflags.h (at least that's where it is in the head of tree) was checked-in. If it's at the boundary between working and non-working versions for SMTC, it might be the cause of the problems, but it may well not be responsible for anything other than the problem with reporting the value in the MT register dump - which really ought to be fixed. I'm in a small village in France for the holidays with no git/build system at my disposal, but I think that if you were to tweak mips-mt.c at line 103 to change the tcstatval = flags; /* And pre-dump TCStatus is flags */ to something more like /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable */ tcstatval = (read_c0_tcstatus() & ~0x400) | flags; should fix the dump. Regards, Kevin K. On 12/20/10 2:44 AM, Anoop P A wrote: Hi Kevin, Please find disassembly for mips_mt_reg_dump Thanks Anoop Disassembly of section .text: 00000000 <mips_mt_regdump>: 0: 27bdffb8 addiu sp,sp,-72 4: 00802821 move a1,a0 8: afbf0044 sw ra,68(sp) c: afbe0040 sw s8,64(sp) 10: afb7003c sw s7,60(sp) 14: afb60038 sw s6,56(sp) 18: afb50034 sw s5,52(sp) 1c: afb40030 sw s4,48(sp) 20: afb3002c sw s3,44(sp) 24: afb20028 sw s2,40(sp) 28: afb10024 sw s1,36(sp) 2c: afb00020 sw s0,32(sp) 30: 40141001 mfc0 s4,c0_tcstatus 34: 36810400 ori at,s4,0x400 38: 40811001 mtc0 at,c0_tcstatus 3c: 32940400 andi s4,s4,0x400 40: 000000c0 ehb 44: 41610001 dvpe at 48: 0020a821 move s5,at 4c: 000000c0 ehb 50: 3c020000 lui v0,0x0 54: 24420060 addiu v0,v0,96 58: 00400408 jr.hb v0 5c: 00000000 nop 60: 3c040000 lui a0,0x0 64: 24840000 addiu a0,a0,0 68: 0c000000 jal 0 <mips_mt_regdump> 6c: afa50010 sw a1,16(sp) 70: 3c040000 lui a0,0x0 74: 0c000000 jal 0 <mips_mt_regdump> 78: 24840000 addiu a0,a0,0 7c: 8fa50010 lw a1,16(sp) 80: 3c040000 lui a0,0x0 84: 0c000000 jal 0 <mips_mt_regdump> 88: 24840000 addiu a0,a0,0 8c: 3c040000 lui a0,0x0 90: 24840000 addiu a0,a0,0 94: 0c000000 jal 0 <mips_mt_regdump> 98: 02a02821 move a1,s5 9c: 40110002 mfc0 s1,c0_mvpconf0 a0: 3c040000 lui a0,0x0 a4: 02202821 move a1,s1 a8: 0c000000 jal 0 <mips_mt_regdump> ac: 24840000 addiu a0,a0,0 b0: 3c040000 lui a0,0x0 b4: 0c000000 jal 0 <mips_mt_regdump> b8: 24840000 addiu a0,a0,0 bc: 7e331a80 ext s3,s1,0xa,0x4 c0: 3c090000 lui t1,0x0 c4: 323100ff andi s1,s1,0xff c8: 3c080000 lui t0,0x0 cc: 3c030000 lui v1,0x0 d0: 3c1e0000 lui s8,0x0 d4: 3c170000 lui s7,0x0 d8: 3c160000 lui s6,0x0 dc: 3c0a0000 lui t2,0x0 e0: 26730001 addiu s3,s3,1 e4: 26310001 addiu s1,s1,1 e8: 00008021 move s0,zero ec: 2412ff00 li s2,-256 f0: 25290000 addiu t1,t1,0 f4: 25080000 addiu t0,t0,0 f8: 24630000 addiu v1,v1,0 fc: 27de0000 addiu s8,s8,0 100: 26f70000 addiu s7,s7,0 104: 26d60000 addiu s6,s6,0 108: 254a0000 addiu t2,t2,0 10c: 00001021 move v0,zero 110: 40040801 mfc0 a0,c0_vpecontrol 114: 00922024 and a0,a0,s2 118: 00442025 or a0,v0,a0 11c: 40840801 mtc0 a0,c0_vpecontrol 120: 000000c0 ehb 124: 41020802 mftc0 at,c0_tcbind 128: 00202021 move a0,at 12c: 24420001 addiu v0,v0,1 130: 3084000f andi a0,a0,0xf 134: 12040031 beq s0,a0,1fc <mips_mt_regdump+0x1fc> 138: 0051282a slt a1,v0,s1 13c: 14a0fff4 bnez a1,110 <mips_mt_regdump+0x110> 140: 00000000 nop 144: 26100001 addiu s0,s0,1 148: 0213102a slt v0,s0,s3 14c: 1440fff0 bnez v0,110 <mips_mt_regdump+0x110> 150: 00001021 move v0,zero 154: 3c040000 lui a0,0x0 158: 24840000 addiu a0,a0,0 15c: 3c1e0000 lui s8,0x0 160: 3c170000 lui s7,0x0 164: 3c160000 lui s6,0x0 168: 3c130000 lui s3,0x0 16c: 0c000000 jal 0 <mips_mt_regdump> 170: 3c120000 lui s2,0x0 174: 00008021 move s0,zero 178: 27de0000 addiu s8,s8,0 17c: 26f70000 addiu s7,s7,0 180: 26d60000 addiu s6,s6,0 184: 26730000 addiu s3,s3,0 188: 26520000 addiu s2,s2,0 18c: 40020801 mfc0 v0,c0_vpecontrol 190: 2403ff00 li v1,-256 194: 00431024 and v0,v0,v1 198: 02021025 or v0,s0,v0 19c: 40820801 mtc0 v0,c0_vpecontrol 1a0: 000000c0 ehb 1a4: 41020802 mftc0 at,c0_tcbind 1a8: 00201821 move v1,at 1ac: 40021002 mfc0 v0,c0_tcbind 1b0: 1062003f beq v1,v0,2b0 <mips_mt_regdump+0x2b0> 1b4: 00000000 nop 1b8: 41020804 mftc0 at,c0_tchalt 1bc: 00201821 move v1,at 1c0: 24020001 li v0,1 1c4: 00400821 move at,v0 1c8: 41811004 mttc0 at,c0_tchalt 1cc: 41020801 mftc0 at,c0_tcstatus 1d0: 00203021 move a2,at 1d4: 3c040000 lui a0,0x0 1d8: 02002821 move a1,s0 1dc: 24840000 addiu a0,a0,0 1e0: afa3001c sw v1,28(sp) 1e4: 0c000000 jal 0 <mips_mt_regdump> 1e8: afa60010 sw a2,16(sp) 1ec: 8fa60010 lw a2,16(sp) 1f0: 8fa3001c lw v1,28(sp) 1f4: 080000b2 j 2c8 <mips_mt_regdump+0x2c8> 1f8: 00c02821 move a1,a2 1fc: 01202021 move a0,t1 200: 02002821 move a1,s0 204: afa3001c sw v1,28(sp) 208: afa80014 sw t0,20(sp) 20c: afa90010 sw t1,16(sp) 210: 0c000000 jal 0 <mips_mt_regdump> 214: afaa0018 sw t2,24(sp) 218: 41010801 mftc0 at,c0_vpecontrol 21c: 00202821 move a1,at 220: 8fa80014 lw t0,20(sp) 224: 0c000000 jal 0 <mips_mt_regdump> 228: 01002021 move a0,t0 22c: 41010802 mftc0 at,c0_vpeconf0 230: 00202821 move a1,at 234: 8fa3001c lw v1,28(sp) 238: 0c000000 jal 0 <mips_mt_regdump> 23c: 00602021 move a0,v1 240: 410c0800 mftc0 at,c0_status 244: 00203021 move a2,at 248: 03c02021 move a0,s8 24c: 0c000000 jal 0 <mips_mt_regdump> 250: 02002821 move a1,s0 254: 410e0800 mftc0 at,c0_epc 258: 00203021 move a2,at 25c: 410e0800 mftc0 at,c0_epc 260: 00203821 move a3,at 264: 02e02021 move a0,s7 268: 0c000000 jal 0 <mips_mt_regdump> 26c: 02002821 move a1,s0 270: 410d0800 mftc0 at,c0_cause 274: 00203021 move a2,at 278: 02c02021 move a0,s6 27c: 0c000000 jal 0 <mips_mt_regdump> 280: 02002821 move a1,s0 284: 41100807 mftc0 at,$16,7 288: 00203021 move a2,at 28c: 8faa0018 lw t2,24(sp) 290: 02002821 move a1,s0 294: 0c000000 jal 0 <mips_mt_regdump> 298: 01402021 move a0,t2 29c: 8fa3001c lw v1,28(sp) 2a0: 8fa80014 lw t0,20(sp) 2a4: 8fa90010 lw t1,16(sp) 2a8: 08000051 j 144 <mips_mt_regdump+0x144> 2ac: 8faa0018 lw t2,24(sp) 2b0: 3c040000 lui a0,0x0 2b4: 02002821 move a1,s0 2b8: 0c000000 jal 0 <mips_mt_regdump> 2bc: 24840000 addiu a0,a0,0 2c0: 00001821 move v1,zero 2c4: 02802821 move a1,s4 2c8: 03c02021 move a0,s8 2cc: 0c000000 jal 0 <mips_mt_regdump> 2d0: afa3001c sw v1,28(sp) 2d4: 41020802 mftc0 at,c0_tcbind 2d8: 00202821 move a1,at 2dc: 0c000000 jal 0 <mips_mt_regdump> 2e0: 02e02021 move a0,s7 2e4: 41020803 mftc0 at,c0_tcrestart 2e8: 00202821 move a1,at 2ec: 41020803 mftc0 at,c0_tcrestart 2f0: 00203021 move a2,at 2f4: 0c000000 jal 0 <mips_mt_regdump> 2f8: 02c02021 move a0,s6 2fc: 8fa3001c lw v1,28(sp) 300: 02602021 move a0,s3 304: 0c000000 jal 0 <mips_mt_regdump> 308: 00602821 move a1,v1 30c: 41020805 mftc0 at,c0_tccontext 310: 00202821 move a1,at 314: 0c000000 jal 0 <mips_mt_regdump> 318: 02402021 move a0,s2 31c: 8fa3001c lw v1,28(sp) 320: 14600003 bnez v1,330 <mips_mt_regdump+0x330> 324: 00001021 move v0,zero 328: 00400821 move at,v0 32c: 41811004 mttc0 at,c0_tchalt 330: 26100001 addiu s0,s0,1 334: 0211102a slt v0,s0,s1 338: 1440ff94 bnez v0,18c <mips_mt_regdump+0x18c> 33c: 00000000 nop 340: 0c000000 jal 0 <mips_mt_regdump> 344: 32b50001 andi s5,s5,0x1 348: 3c040000 lui a0,0x0 34c: 0c000000 jal 0 <mips_mt_regdump> 350: 24840000 addiu a0,a0,0 354: 12a00004 beqz s5,368 <mips_mt_regdump+0x368> 358: 32820400 andi v0,s4,0x400 35c: 41600021 evpe 360: 000000c0 ehb 364: 32820400 andi v0,s4,0x400 368: 14400003 bnez v0,378 <mips_mt_regdump+0x378> 36c: 00000000 nop 370: 0c000000 jal 0 <mips_mt_regdump> 374: 00000000 nop 378: 40011001 mfc0 at,c0_tcstatus 37c: 32940400 andi s4,s4,0x400 380: 34210400 ori at,at,0x400 384: 38210400 xori at,at,0x400 388: 0281a025 or s4,s4,at 38c: 40941001 mtc0 s4,c0_tcstatus 390: 000000c0 ehb 394: 8fbf0044 lw ra,68(sp) 398: 8fbe0040 lw s8,64(sp) 39c: 8fb7003c lw s7,60(sp) 3a0: 8fb60038 lw s6,56(sp) 3a4: 8fb50034 lw s5,52(sp) 3a8: 8fb40030 lw s4,48(sp) 3ac: 8fb3002c lw s3,44(sp) 3b0: 8fb20028 lw s2,40(sp) 3b4: 8fb10024 lw s1,36(sp) 3b8: 8fb00020 lw s0,32(sp) 3bc: 03e00008 jr ra 3c0: 27bd0048 addiu sp,sp,72 On Sat, Dec 18, 2010 at 3:05 AM, Kevin D. Kissell <kevink@paralogos.com> wrote: So, Anoop, if you get a minute for this any time in the next day or so (after which I'll have very limited net access until next year), could you please do an <mumble>-mips<mumble>-objdump --disassemble of your kernel image (or even just the mips-mt.o module) from a failing kernel build and post the disassembly of mips_mt_regdump()? The confirmation or refutation of the theory about local_irq_save() no longer being built correctly for SMTC would be within the first few instructions... /K. On 12/16/10 11:58, Kevin D. Kissell wrote: Ralf tells me that this message got blocked by the LMO server due to HTML content. So here it is again, textier. On 12/16/10 11:24, Kevin D. Kissell wrote: On 12/16/10 07:37, STUART VENTERS wrote: Two other possible clues: The EVP is clear in the MVPControl register. Does this say that only VPE0, T0 gets to run? That's correct. In the maxtcs=1/maxvpes=1 boot state, it wouldn't matter. It's just possible that setting EVP is conditional on more than one VPE being used, but that's not the way I remember it. Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage Exception dispatch. But that seems to conflict the EVP bit above. I don't have a copy of the ASE spec handy to see whether those bits have a defined power-on value, but particularly if maxvpes=1 was set at boot time, I would expect VPE1's registers to be in a partly random power-up state. Perhaps these are an artifact of getting to a good state to dump things out. As per my previous mail, I looked at the MT register dump source, and it really does pull values directly out of registers and doesn't depend on having a sane kernel stack frame. The exceptions to that rule are the reported values for TCStatus of the executing TC, which is based on the perhaps-now-broken assumption that local_irq_save(flags) stores the *entire* pre-invocation value of the TCStatus register in the flags variable, and MVPcontrol, which is based on the assumption that dvpe() returns the pre-invocation value of MVPcontrol. Break those assumptions, and you'll get inconsistent state dumps like this, and very possibly incorrect execution. Particularly if what was done was that effectively replaces the SMTC-specific implementation of local_irq_save()/local_irq_restore() with something that uses the generic MIPS32R2 atomic interrupt enable/disable instructions. That would have been a *very* bad idea... Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-21 20:29 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-21 20:29 UTC (permalink / raw) To: Anoop P.A., Kevin D. Kissell, Anoop P A; +Cc: STUART VENTERS, linux-mips Sorry I misunderstood file. git blame shows that "andi" is around for quite sometime . 49a89efb include/asm-mips/irqflags.h (Ralf Baechle 2007-10-11 23:46:15 +0100 128) __asm__( df9ee292 arch/mips/include/asm/irqflags.h (David Howells 2010-10-07 14:08:55 +0100 129) " .macro arch_local_irq_save result ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 130) " .set push ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 131) " .set reorder ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 132) " .set noat 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 133) #ifdef CONFIG_MIPS_MT_SMTC 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 134) " mfc0 \\result, $2, 1 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 135) " ori $1, \\result, 0x400 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 136) " .set noreorder 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 137) " mtc0 $1, $2, 1 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 138) " andi \\result, \\result, 0x400 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 139) #elif defined(CONFIG_CPU_MIPSR2) ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 140) " di \\result 15265251 include/asm-mips/interrupt.h (Maxime Bizon 2005-12-20 06:32:19 +0100 141) " andi \\result, 1 ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 142) #else ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 143) " mfc0 \\result, $12 c226f260 include/asm-mips/interrupt.h (Atsushi Nemoto 2006-02-03 01:34:01 +0900 144) " ori $1, \\result, 0x1f c226f260 include/asm-mips/interrupt.h (Atsushi Nemoto 2006-02-03 01:34:01 +0900 145) " xori $1, 0x1f ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 146) " .set noreorder ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 147) " mtc0 $1, $12 ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 148) #endif ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 149) " irq_disable_hazard ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 150) " .set pop ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 151) " .endm ^1da177e include/asm-mips/interrupt.h (Linus Torvalds 2005-04-16 15:20:36 -0700 152) > -----Original Message----- > From: linux-mips-bounce@linux-mips.org [mailto:linux-mips-bounce@linux- > mips.org] On Behalf Of Anoop P.A. > Sent: Wednesday, December 22, 2010 1:37 AM > To: Kevin D. Kissell; Anoop P A > Cc: STUART VENTERS; linux-mips@linux-mips.org > Subject: RE: SMTC support status in latest git head. > > > OK. I will check it. > > BTW following patch is responsible for irq change. > > http://git.linux- > mips.org/?p=linux.git;a=commitdiff;h=df9ee29270c11dba7d0fe0b83ce47a4d8e8d2 > 101 > > Thanks > Anoop > ________________________________________ > From: Kevin D. Kissell [mailto:kevink@paralogos.com] > Sent: Wednesday, December 22, 2010 12:23 AM > To: Anoop P A > Cc: STUART VENTERS; linux-mips@linux-mips.org; Anoop P.A. > Subject: Re: SMTC support status in latest git head. > > OK, I see why the MT register dump isn't giving us useful information. > It's not clear that it's at the root of your functional problems, though. > Apparently, somebody decided that it was unwholesome to propagate anything > other than the previous interrupt enable state in the flags variable > passed between irq_save() and irq_restore(). I agree philosophically, but > it does break the MT register dump function. And I'm quite sure that > there were other bits of SMTC code that knew that it was a TCStatus value, > at least in the earliest versions of the code. I'm not a gitweb power > user, but I haven't been able to figure out how to determine when the > "andi \\result 0x400" on or about line 138 of irqflags.h (at least that's > where it is in the head of tree) was checked-in. If it's at the boundary > between working and non-working versions for SMTC, it might be the cause > of the problems, but it may well not be responsible for anything other > than the problem with reporting the value in > the MT register dump - which really ought to be fixed. > > I'm in a small village in France for the holidays with no git/build system > at my disposal, but I think that if you were to tweak mips-mt.c at line > 103 to change > the > > tcstatval = flags; /* And pre-dump TCStatus is flags */ > > > > to something more like > > > > /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable > */ > > tcstatval = (read_c0_tcstatus() & ~0x400) | flags; > > > > should fix the dump. > > Regards, > > Kevin K. > > On 12/20/10 2:44 AM, Anoop P A wrote: > Hi Kevin, > > Please find disassembly for mips_mt_reg_dump > > Thanks > Anoop > > Disassembly of section .text: > > 00000000 <mips_mt_regdump>: > 0: 27bdffb8 addiu sp,sp,-72 > 4: 00802821 move a1,a0 > 8: afbf0044 sw ra,68(sp) > c: afbe0040 sw s8,64(sp) > 10: afb7003c sw s7,60(sp) > 14: afb60038 sw s6,56(sp) > 18: afb50034 sw s5,52(sp) > 1c: afb40030 sw s4,48(sp) > 20: afb3002c sw s3,44(sp) > 24: afb20028 sw s2,40(sp) > 28: afb10024 sw s1,36(sp) > 2c: afb00020 sw s0,32(sp) > 30: 40141001 mfc0 s4,c0_tcstatus > 34: 36810400 ori at,s4,0x400 > 38: 40811001 mtc0 at,c0_tcstatus > 3c: 32940400 andi s4,s4,0x400 > 40: 000000c0 ehb > 44: 41610001 dvpe at > 48: 0020a821 move s5,at > 4c: 000000c0 ehb > 50: 3c020000 lui v0,0x0 > 54: 24420060 addiu v0,v0,96 > 58: 00400408 jr.hb v0 > 5c: 00000000 nop > 60: 3c040000 lui a0,0x0 > 64: 24840000 addiu a0,a0,0 > 68: 0c000000 jal 0 <mips_mt_regdump> > 6c: afa50010 sw a1,16(sp) > 70: 3c040000 lui a0,0x0 > 74: 0c000000 jal 0 <mips_mt_regdump> > 78: 24840000 addiu a0,a0,0 > 7c: 8fa50010 lw a1,16(sp) > 80: 3c040000 lui a0,0x0 > 84: 0c000000 jal 0 <mips_mt_regdump> > 88: 24840000 addiu a0,a0,0 > 8c: 3c040000 lui a0,0x0 > 90: 24840000 addiu a0,a0,0 > 94: 0c000000 jal 0 <mips_mt_regdump> > 98: 02a02821 move a1,s5 > 9c: 40110002 mfc0 s1,c0_mvpconf0 > a0: 3c040000 lui a0,0x0 > a4: 02202821 move a1,s1 > a8: 0c000000 jal 0 <mips_mt_regdump> > ac: 24840000 addiu a0,a0,0 > b0: 3c040000 lui a0,0x0 > b4: 0c000000 jal 0 <mips_mt_regdump> > b8: 24840000 addiu a0,a0,0 > bc: 7e331a80 ext s3,s1,0xa,0x4 > c0: 3c090000 lui t1,0x0 > c4: 323100ff andi s1,s1,0xff > c8: 3c080000 lui t0,0x0 > cc: 3c030000 lui v1,0x0 > d0: 3c1e0000 lui s8,0x0 > d4: 3c170000 lui s7,0x0 > d8: 3c160000 lui s6,0x0 > dc: 3c0a0000 lui t2,0x0 > e0: 26730001 addiu s3,s3,1 > e4: 26310001 addiu s1,s1,1 > e8: 00008021 move s0,zero > ec: 2412ff00 li s2,-256 > f0: 25290000 addiu t1,t1,0 > f4: 25080000 addiu t0,t0,0 > f8: 24630000 addiu v1,v1,0 > fc: 27de0000 addiu s8,s8,0 > 100: 26f70000 addiu s7,s7,0 > 104: 26d60000 addiu s6,s6,0 > 108: 254a0000 addiu t2,t2,0 > 10c: 00001021 move v0,zero > 110: 40040801 mfc0 a0,c0_vpecontrol > 114: 00922024 and a0,a0,s2 > 118: 00442025 or a0,v0,a0 > 11c: 40840801 mtc0 a0,c0_vpecontrol > 120: 000000c0 ehb > 124: 41020802 mftc0 at,c0_tcbind > 128: 00202021 move a0,at > 12c: 24420001 addiu v0,v0,1 > 130: 3084000f andi a0,a0,0xf > 134: 12040031 beq s0,a0,1fc <mips_mt_regdump+0x1fc> > 138: 0051282a slt a1,v0,s1 > 13c: 14a0fff4 bnez a1,110 <mips_mt_regdump+0x110> > 140: 00000000 nop > 144: 26100001 addiu s0,s0,1 > 148: 0213102a slt v0,s0,s3 > 14c: 1440fff0 bnez v0,110 <mips_mt_regdump+0x110> > 150: 00001021 move v0,zero > 154: 3c040000 lui a0,0x0 > 158: 24840000 addiu a0,a0,0 > 15c: 3c1e0000 lui s8,0x0 > 160: 3c170000 lui s7,0x0 > 164: 3c160000 lui s6,0x0 > 168: 3c130000 lui s3,0x0 > 16c: 0c000000 jal 0 <mips_mt_regdump> > 170: 3c120000 lui s2,0x0 > 174: 00008021 move s0,zero > 178: 27de0000 addiu s8,s8,0 > 17c: 26f70000 addiu s7,s7,0 > 180: 26d60000 addiu s6,s6,0 > 184: 26730000 addiu s3,s3,0 > 188: 26520000 addiu s2,s2,0 > 18c: 40020801 mfc0 v0,c0_vpecontrol > 190: 2403ff00 li v1,-256 > 194: 00431024 and v0,v0,v1 > 198: 02021025 or v0,s0,v0 > 19c: 40820801 mtc0 v0,c0_vpecontrol > 1a0: 000000c0 ehb > 1a4: 41020802 mftc0 at,c0_tcbind > 1a8: 00201821 move v1,at > 1ac: 40021002 mfc0 v0,c0_tcbind > 1b0: 1062003f beq v1,v0,2b0 <mips_mt_regdump+0x2b0> > 1b4: 00000000 nop > 1b8: 41020804 mftc0 at,c0_tchalt > 1bc: 00201821 move v1,at > 1c0: 24020001 li v0,1 > 1c4: 00400821 move at,v0 > 1c8: 41811004 mttc0 at,c0_tchalt > 1cc: 41020801 mftc0 at,c0_tcstatus > 1d0: 00203021 move a2,at > 1d4: 3c040000 lui a0,0x0 > 1d8: 02002821 move a1,s0 > 1dc: 24840000 addiu a0,a0,0 > 1e0: afa3001c sw v1,28(sp) > 1e4: 0c000000 jal 0 <mips_mt_regdump> > 1e8: afa60010 sw a2,16(sp) > 1ec: 8fa60010 lw a2,16(sp) > 1f0: 8fa3001c lw v1,28(sp) > 1f4: 080000b2 j 2c8 <mips_mt_regdump+0x2c8> > 1f8: 00c02821 move a1,a2 > 1fc: 01202021 move a0,t1 > 200: 02002821 move a1,s0 > 204: afa3001c sw v1,28(sp) > 208: afa80014 sw t0,20(sp) > 20c: afa90010 sw t1,16(sp) > 210: 0c000000 jal 0 <mips_mt_regdump> > 214: afaa0018 sw t2,24(sp) > 218: 41010801 mftc0 at,c0_vpecontrol > 21c: 00202821 move a1,at > 220: 8fa80014 lw t0,20(sp) > 224: 0c000000 jal 0 <mips_mt_regdump> > 228: 01002021 move a0,t0 > 22c: 41010802 mftc0 at,c0_vpeconf0 > 230: 00202821 move a1,at > 234: 8fa3001c lw v1,28(sp) > 238: 0c000000 jal 0 <mips_mt_regdump> > 23c: 00602021 move a0,v1 > 240: 410c0800 mftc0 at,c0_status > 244: 00203021 move a2,at > 248: 03c02021 move a0,s8 > 24c: 0c000000 jal 0 <mips_mt_regdump> > 250: 02002821 move a1,s0 > 254: 410e0800 mftc0 at,c0_epc > 258: 00203021 move a2,at > 25c: 410e0800 mftc0 at,c0_epc > 260: 00203821 move a3,at > 264: 02e02021 move a0,s7 > 268: 0c000000 jal 0 <mips_mt_regdump> > 26c: 02002821 move a1,s0 > 270: 410d0800 mftc0 at,c0_cause > 274: 00203021 move a2,at > 278: 02c02021 move a0,s6 > 27c: 0c000000 jal 0 <mips_mt_regdump> > 280: 02002821 move a1,s0 > 284: 41100807 mftc0 at,$16,7 > 288: 00203021 move a2,at > 28c: 8faa0018 lw t2,24(sp) > 290: 02002821 move a1,s0 > 294: 0c000000 jal 0 <mips_mt_regdump> > 298: 01402021 move a0,t2 > 29c: 8fa3001c lw v1,28(sp) > 2a0: 8fa80014 lw t0,20(sp) > 2a4: 8fa90010 lw t1,16(sp) > 2a8: 08000051 j 144 <mips_mt_regdump+0x144> > 2ac: 8faa0018 lw t2,24(sp) > 2b0: 3c040000 lui a0,0x0 > 2b4: 02002821 move a1,s0 > 2b8: 0c000000 jal 0 <mips_mt_regdump> > 2bc: 24840000 addiu a0,a0,0 > 2c0: 00001821 move v1,zero > 2c4: 02802821 move a1,s4 > 2c8: 03c02021 move a0,s8 > 2cc: 0c000000 jal 0 <mips_mt_regdump> > 2d0: afa3001c sw v1,28(sp) > 2d4: 41020802 mftc0 at,c0_tcbind > 2d8: 00202821 move a1,at > 2dc: 0c000000 jal 0 <mips_mt_regdump> > 2e0: 02e02021 move a0,s7 > 2e4: 41020803 mftc0 at,c0_tcrestart > 2e8: 00202821 move a1,at > 2ec: 41020803 mftc0 at,c0_tcrestart > 2f0: 00203021 move a2,at > 2f4: 0c000000 jal 0 <mips_mt_regdump> > 2f8: 02c02021 move a0,s6 > 2fc: 8fa3001c lw v1,28(sp) > 300: 02602021 move a0,s3 > 304: 0c000000 jal 0 <mips_mt_regdump> > 308: 00602821 move a1,v1 > 30c: 41020805 mftc0 at,c0_tccontext > 310: 00202821 move a1,at > 314: 0c000000 jal 0 <mips_mt_regdump> > 318: 02402021 move a0,s2 > 31c: 8fa3001c lw v1,28(sp) > 320: 14600003 bnez v1,330 <mips_mt_regdump+0x330> > 324: 00001021 move v0,zero > 328: 00400821 move at,v0 > 32c: 41811004 mttc0 at,c0_tchalt > 330: 26100001 addiu s0,s0,1 > 334: 0211102a slt v0,s0,s1 > 338: 1440ff94 bnez v0,18c <mips_mt_regdump+0x18c> > 33c: 00000000 nop > 340: 0c000000 jal 0 <mips_mt_regdump> > 344: 32b50001 andi s5,s5,0x1 > 348: 3c040000 lui a0,0x0 > 34c: 0c000000 jal 0 <mips_mt_regdump> > 350: 24840000 addiu a0,a0,0 > 354: 12a00004 beqz s5,368 <mips_mt_regdump+0x368> > 358: 32820400 andi v0,s4,0x400 > 35c: 41600021 evpe > 360: 000000c0 ehb > 364: 32820400 andi v0,s4,0x400 > 368: 14400003 bnez v0,378 <mips_mt_regdump+0x378> > 36c: 00000000 nop > 370: 0c000000 jal 0 <mips_mt_regdump> > 374: 00000000 nop > 378: 40011001 mfc0 at,c0_tcstatus > 37c: 32940400 andi s4,s4,0x400 > 380: 34210400 ori at,at,0x400 > 384: 38210400 xori at,at,0x400 > 388: 0281a025 or s4,s4,at > 38c: 40941001 mtc0 s4,c0_tcstatus > 390: 000000c0 ehb > 394: 8fbf0044 lw ra,68(sp) > 398: 8fbe0040 lw s8,64(sp) > 39c: 8fb7003c lw s7,60(sp) > 3a0: 8fb60038 lw s6,56(sp) > 3a4: 8fb50034 lw s5,52(sp) > 3a8: 8fb40030 lw s4,48(sp) > 3ac: 8fb3002c lw s3,44(sp) > 3b0: 8fb20028 lw s2,40(sp) > 3b4: 8fb10024 lw s1,36(sp) > 3b8: 8fb00020 lw s0,32(sp) > 3bc: 03e00008 jr ra > 3c0: 27bd0048 addiu sp,sp,72 > > > On Sat, Dec 18, 2010 at 3:05 AM, Kevin D. Kissell <kevink@paralogos.com> > wrote: > So, Anoop, if you get a minute for this any time in the next day or so > (after which I'll have very limited net access until next year), could you > please do an <mumble>-mips<mumble>-objdump --disassemble of your kernel > image (or even just the mips-mt.o module) from a failing kernel build and > post the disassembly of mips_mt_regdump()? The confirmation or refutation > of the theory about local_irq_save() no longer being built correctly for > SMTC would be within the first few instructions... > > /K. > > > On 12/16/10 11:58, Kevin D. Kissell wrote: > Ralf tells me that this message got blocked by the LMO server due to HTML > content. > So here it is again, textier. > > On 12/16/10 11:24, Kevin D. Kissell wrote: > On 12/16/10 07:37, STUART VENTERS wrote: > > Two other possible clues: > > The EVP is clear in the MVPControl register. > Does this say that only VPE0, T0 gets to run? > That's correct. In the maxtcs=1/maxvpes=1 boot state, it wouldn't matter. > It's just possible that setting EVP is conditional on more than one VPE > being used, but that's not the way I remember it. > > Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage > Exception dispatch. > But that seems to conflict the EVP bit above. > I don't have a copy of the ASE spec handy to see whether those bits have a > defined power-on value, but particularly if maxvpes=1 was set at boot > time, > I would expect VPE1's registers to be in a partly random power-up state. > > Perhaps these are an artifact of getting to a good state to dump things > out. > As per my previous mail, I looked at the MT register dump source, and it > really does pull values directly > out of registers and doesn't depend on having a sane kernel stack frame. > The exceptions to that rule > are the reported values for TCStatus of the executing TC, which is based > on the perhaps-now-broken > assumption that local_irq_save(flags) stores the *entire* pre-invocation > value of the TCStatus register > in the flags variable, and MVPcontrol, which is based on the assumption > that dvpe() returns the pre-invocation > value of MVPcontrol. Break those assumptions, and you'll get inconsistent > state dumps like this, > and very possibly incorrect execution. Particularly if what was done was > that effectively replaces > the SMTC-specific implementation of local_irq_save()/local_irq_restore() > with something that uses > the generic MIPS32R2 atomic interrupt enable/disable instructions. That > would have been a *very* bad idea... > > Regards, > > Kevin K. > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-21 20:29 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-21 20:29 UTC (permalink / raw) To: Anoop P.A., Kevin D. Kissell, Anoop P A; +Cc: STUART VENTERS, linux-mips Sorry I misunderstood file. git blame shows that "andi" is around for quite sometime . 49a89efb include/asm-mips/irqflags.h (Ralf Baechle 2007-10-11 23:46:15 +0100 128) __asm__( df9ee292 arch/mips/include/asm/irqflags.h (David Howells 2010-10-07 14:08:55 +0100 129) " .macro arch_local_irq_save result ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 130) " .set push ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 131) " .set reorder ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 132) " .set noat 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 133) #ifdef CONFIG_MIPS_MT_SMTC 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 134) " mfc0 \\result, $2, 1 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 135) " ori $1, \\result, 0x400 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 136) " .set noreorder 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 137) " mtc0 $1, $2, 1 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 138) " andi \\result, \\result, 0x400 41c594ab include/asm-mips/interrupt.h (Ralf Baechle 2006-04-05 09:45:45 +0100 139) #elif defined(CONFIG_CPU_MIPSR2) ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 140) " di \\result 15265251 include/asm-mips/interrupt.h (Maxime Bizon 2005-12-20 06:32:19 +0100 141) " andi \\result, 1 ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 142) #else ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 143) " mfc0 \\result, $12 c226f260 include/asm-mips/interrupt.h (Atsushi Nemoto 2006-02-03 01:34:01 +0900 144) " ori $1, \\result, 0x1f c226f260 include/asm-mips/interrupt.h (Atsushi Nemoto 2006-02-03 01:34:01 +0900 145) " xori $1, 0x1f ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 146) " .set noreorder ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 147) " mtc0 $1, $12 ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 148) #endif ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 149) " irq_disable_hazard ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 150) " .set pop ff88f8a3 include/asm-mips/interrupt.h (Ralf Baechle 2005-07-12 14:54:31 +0000 151) " .endm ^1da177e include/asm-mips/interrupt.h (Linus Torvalds 2005-04-16 15:20:36 -0700 152) > -----Original Message----- > From: linux-mips-bounce@linux-mips.org [mailto:linux-mips-bounce@linux- > mips.org] On Behalf Of Anoop P.A. > Sent: Wednesday, December 22, 2010 1:37 AM > To: Kevin D. Kissell; Anoop P A > Cc: STUART VENTERS; linux-mips@linux-mips.org > Subject: RE: SMTC support status in latest git head. > > > OK. I will check it. > > BTW following patch is responsible for irq change. > > http://git.linux- > mips.org/?p=linux.git;a=commitdiff;h=df9ee29270c11dba7d0fe0b83ce47a4d8e8d2 > 101 > > Thanks > Anoop > ________________________________________ > From: Kevin D. Kissell [mailto:kevink@paralogos.com] > Sent: Wednesday, December 22, 2010 12:23 AM > To: Anoop P A > Cc: STUART VENTERS; linux-mips@linux-mips.org; Anoop P.A. > Subject: Re: SMTC support status in latest git head. > > OK, I see why the MT register dump isn't giving us useful information. > It's not clear that it's at the root of your functional problems, though. > Apparently, somebody decided that it was unwholesome to propagate anything > other than the previous interrupt enable state in the flags variable > passed between irq_save() and irq_restore(). I agree philosophically, but > it does break the MT register dump function. And I'm quite sure that > there were other bits of SMTC code that knew that it was a TCStatus value, > at least in the earliest versions of the code. I'm not a gitweb power > user, but I haven't been able to figure out how to determine when the > "andi \\result 0x400" on or about line 138 of irqflags.h (at least that's > where it is in the head of tree) was checked-in. If it's at the boundary > between working and non-working versions for SMTC, it might be the cause > of the problems, but it may well not be responsible for anything other > than the problem with reporting the value in > the MT register dump - which really ought to be fixed. > > I'm in a small village in France for the holidays with no git/build system > at my disposal, but I think that if you were to tweak mips-mt.c at line > 103 to change > the > > tcstatval = flags; /* And pre-dump TCStatus is flags */ > > > > to something more like > > > > /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable > */ > > tcstatval = (read_c0_tcstatus() & ~0x400) | flags; > > > > should fix the dump. > > Regards, > > Kevin K. > > On 12/20/10 2:44 AM, Anoop P A wrote: > Hi Kevin, > > Please find disassembly for mips_mt_reg_dump > > Thanks > Anoop > > Disassembly of section .text: > > 00000000 <mips_mt_regdump>: > 0: 27bdffb8 addiu sp,sp,-72 > 4: 00802821 move a1,a0 > 8: afbf0044 sw ra,68(sp) > c: afbe0040 sw s8,64(sp) > 10: afb7003c sw s7,60(sp) > 14: afb60038 sw s6,56(sp) > 18: afb50034 sw s5,52(sp) > 1c: afb40030 sw s4,48(sp) > 20: afb3002c sw s3,44(sp) > 24: afb20028 sw s2,40(sp) > 28: afb10024 sw s1,36(sp) > 2c: afb00020 sw s0,32(sp) > 30: 40141001 mfc0 s4,c0_tcstatus > 34: 36810400 ori at,s4,0x400 > 38: 40811001 mtc0 at,c0_tcstatus > 3c: 32940400 andi s4,s4,0x400 > 40: 000000c0 ehb > 44: 41610001 dvpe at > 48: 0020a821 move s5,at > 4c: 000000c0 ehb > 50: 3c020000 lui v0,0x0 > 54: 24420060 addiu v0,v0,96 > 58: 00400408 jr.hb v0 > 5c: 00000000 nop > 60: 3c040000 lui a0,0x0 > 64: 24840000 addiu a0,a0,0 > 68: 0c000000 jal 0 <mips_mt_regdump> > 6c: afa50010 sw a1,16(sp) > 70: 3c040000 lui a0,0x0 > 74: 0c000000 jal 0 <mips_mt_regdump> > 78: 24840000 addiu a0,a0,0 > 7c: 8fa50010 lw a1,16(sp) > 80: 3c040000 lui a0,0x0 > 84: 0c000000 jal 0 <mips_mt_regdump> > 88: 24840000 addiu a0,a0,0 > 8c: 3c040000 lui a0,0x0 > 90: 24840000 addiu a0,a0,0 > 94: 0c000000 jal 0 <mips_mt_regdump> > 98: 02a02821 move a1,s5 > 9c: 40110002 mfc0 s1,c0_mvpconf0 > a0: 3c040000 lui a0,0x0 > a4: 02202821 move a1,s1 > a8: 0c000000 jal 0 <mips_mt_regdump> > ac: 24840000 addiu a0,a0,0 > b0: 3c040000 lui a0,0x0 > b4: 0c000000 jal 0 <mips_mt_regdump> > b8: 24840000 addiu a0,a0,0 > bc: 7e331a80 ext s3,s1,0xa,0x4 > c0: 3c090000 lui t1,0x0 > c4: 323100ff andi s1,s1,0xff > c8: 3c080000 lui t0,0x0 > cc: 3c030000 lui v1,0x0 > d0: 3c1e0000 lui s8,0x0 > d4: 3c170000 lui s7,0x0 > d8: 3c160000 lui s6,0x0 > dc: 3c0a0000 lui t2,0x0 > e0: 26730001 addiu s3,s3,1 > e4: 26310001 addiu s1,s1,1 > e8: 00008021 move s0,zero > ec: 2412ff00 li s2,-256 > f0: 25290000 addiu t1,t1,0 > f4: 25080000 addiu t0,t0,0 > f8: 24630000 addiu v1,v1,0 > fc: 27de0000 addiu s8,s8,0 > 100: 26f70000 addiu s7,s7,0 > 104: 26d60000 addiu s6,s6,0 > 108: 254a0000 addiu t2,t2,0 > 10c: 00001021 move v0,zero > 110: 40040801 mfc0 a0,c0_vpecontrol > 114: 00922024 and a0,a0,s2 > 118: 00442025 or a0,v0,a0 > 11c: 40840801 mtc0 a0,c0_vpecontrol > 120: 000000c0 ehb > 124: 41020802 mftc0 at,c0_tcbind > 128: 00202021 move a0,at > 12c: 24420001 addiu v0,v0,1 > 130: 3084000f andi a0,a0,0xf > 134: 12040031 beq s0,a0,1fc <mips_mt_regdump+0x1fc> > 138: 0051282a slt a1,v0,s1 > 13c: 14a0fff4 bnez a1,110 <mips_mt_regdump+0x110> > 140: 00000000 nop > 144: 26100001 addiu s0,s0,1 > 148: 0213102a slt v0,s0,s3 > 14c: 1440fff0 bnez v0,110 <mips_mt_regdump+0x110> > 150: 00001021 move v0,zero > 154: 3c040000 lui a0,0x0 > 158: 24840000 addiu a0,a0,0 > 15c: 3c1e0000 lui s8,0x0 > 160: 3c170000 lui s7,0x0 > 164: 3c160000 lui s6,0x0 > 168: 3c130000 lui s3,0x0 > 16c: 0c000000 jal 0 <mips_mt_regdump> > 170: 3c120000 lui s2,0x0 > 174: 00008021 move s0,zero > 178: 27de0000 addiu s8,s8,0 > 17c: 26f70000 addiu s7,s7,0 > 180: 26d60000 addiu s6,s6,0 > 184: 26730000 addiu s3,s3,0 > 188: 26520000 addiu s2,s2,0 > 18c: 40020801 mfc0 v0,c0_vpecontrol > 190: 2403ff00 li v1,-256 > 194: 00431024 and v0,v0,v1 > 198: 02021025 or v0,s0,v0 > 19c: 40820801 mtc0 v0,c0_vpecontrol > 1a0: 000000c0 ehb > 1a4: 41020802 mftc0 at,c0_tcbind > 1a8: 00201821 move v1,at > 1ac: 40021002 mfc0 v0,c0_tcbind > 1b0: 1062003f beq v1,v0,2b0 <mips_mt_regdump+0x2b0> > 1b4: 00000000 nop > 1b8: 41020804 mftc0 at,c0_tchalt > 1bc: 00201821 move v1,at > 1c0: 24020001 li v0,1 > 1c4: 00400821 move at,v0 > 1c8: 41811004 mttc0 at,c0_tchalt > 1cc: 41020801 mftc0 at,c0_tcstatus > 1d0: 00203021 move a2,at > 1d4: 3c040000 lui a0,0x0 > 1d8: 02002821 move a1,s0 > 1dc: 24840000 addiu a0,a0,0 > 1e0: afa3001c sw v1,28(sp) > 1e4: 0c000000 jal 0 <mips_mt_regdump> > 1e8: afa60010 sw a2,16(sp) > 1ec: 8fa60010 lw a2,16(sp) > 1f0: 8fa3001c lw v1,28(sp) > 1f4: 080000b2 j 2c8 <mips_mt_regdump+0x2c8> > 1f8: 00c02821 move a1,a2 > 1fc: 01202021 move a0,t1 > 200: 02002821 move a1,s0 > 204: afa3001c sw v1,28(sp) > 208: afa80014 sw t0,20(sp) > 20c: afa90010 sw t1,16(sp) > 210: 0c000000 jal 0 <mips_mt_regdump> > 214: afaa0018 sw t2,24(sp) > 218: 41010801 mftc0 at,c0_vpecontrol > 21c: 00202821 move a1,at > 220: 8fa80014 lw t0,20(sp) > 224: 0c000000 jal 0 <mips_mt_regdump> > 228: 01002021 move a0,t0 > 22c: 41010802 mftc0 at,c0_vpeconf0 > 230: 00202821 move a1,at > 234: 8fa3001c lw v1,28(sp) > 238: 0c000000 jal 0 <mips_mt_regdump> > 23c: 00602021 move a0,v1 > 240: 410c0800 mftc0 at,c0_status > 244: 00203021 move a2,at > 248: 03c02021 move a0,s8 > 24c: 0c000000 jal 0 <mips_mt_regdump> > 250: 02002821 move a1,s0 > 254: 410e0800 mftc0 at,c0_epc > 258: 00203021 move a2,at > 25c: 410e0800 mftc0 at,c0_epc > 260: 00203821 move a3,at > 264: 02e02021 move a0,s7 > 268: 0c000000 jal 0 <mips_mt_regdump> > 26c: 02002821 move a1,s0 > 270: 410d0800 mftc0 at,c0_cause > 274: 00203021 move a2,at > 278: 02c02021 move a0,s6 > 27c: 0c000000 jal 0 <mips_mt_regdump> > 280: 02002821 move a1,s0 > 284: 41100807 mftc0 at,$16,7 > 288: 00203021 move a2,at > 28c: 8faa0018 lw t2,24(sp) > 290: 02002821 move a1,s0 > 294: 0c000000 jal 0 <mips_mt_regdump> > 298: 01402021 move a0,t2 > 29c: 8fa3001c lw v1,28(sp) > 2a0: 8fa80014 lw t0,20(sp) > 2a4: 8fa90010 lw t1,16(sp) > 2a8: 08000051 j 144 <mips_mt_regdump+0x144> > 2ac: 8faa0018 lw t2,24(sp) > 2b0: 3c040000 lui a0,0x0 > 2b4: 02002821 move a1,s0 > 2b8: 0c000000 jal 0 <mips_mt_regdump> > 2bc: 24840000 addiu a0,a0,0 > 2c0: 00001821 move v1,zero > 2c4: 02802821 move a1,s4 > 2c8: 03c02021 move a0,s8 > 2cc: 0c000000 jal 0 <mips_mt_regdump> > 2d0: afa3001c sw v1,28(sp) > 2d4: 41020802 mftc0 at,c0_tcbind > 2d8: 00202821 move a1,at > 2dc: 0c000000 jal 0 <mips_mt_regdump> > 2e0: 02e02021 move a0,s7 > 2e4: 41020803 mftc0 at,c0_tcrestart > 2e8: 00202821 move a1,at > 2ec: 41020803 mftc0 at,c0_tcrestart > 2f0: 00203021 move a2,at > 2f4: 0c000000 jal 0 <mips_mt_regdump> > 2f8: 02c02021 move a0,s6 > 2fc: 8fa3001c lw v1,28(sp) > 300: 02602021 move a0,s3 > 304: 0c000000 jal 0 <mips_mt_regdump> > 308: 00602821 move a1,v1 > 30c: 41020805 mftc0 at,c0_tccontext > 310: 00202821 move a1,at > 314: 0c000000 jal 0 <mips_mt_regdump> > 318: 02402021 move a0,s2 > 31c: 8fa3001c lw v1,28(sp) > 320: 14600003 bnez v1,330 <mips_mt_regdump+0x330> > 324: 00001021 move v0,zero > 328: 00400821 move at,v0 > 32c: 41811004 mttc0 at,c0_tchalt > 330: 26100001 addiu s0,s0,1 > 334: 0211102a slt v0,s0,s1 > 338: 1440ff94 bnez v0,18c <mips_mt_regdump+0x18c> > 33c: 00000000 nop > 340: 0c000000 jal 0 <mips_mt_regdump> > 344: 32b50001 andi s5,s5,0x1 > 348: 3c040000 lui a0,0x0 > 34c: 0c000000 jal 0 <mips_mt_regdump> > 350: 24840000 addiu a0,a0,0 > 354: 12a00004 beqz s5,368 <mips_mt_regdump+0x368> > 358: 32820400 andi v0,s4,0x400 > 35c: 41600021 evpe > 360: 000000c0 ehb > 364: 32820400 andi v0,s4,0x400 > 368: 14400003 bnez v0,378 <mips_mt_regdump+0x378> > 36c: 00000000 nop > 370: 0c000000 jal 0 <mips_mt_regdump> > 374: 00000000 nop > 378: 40011001 mfc0 at,c0_tcstatus > 37c: 32940400 andi s4,s4,0x400 > 380: 34210400 ori at,at,0x400 > 384: 38210400 xori at,at,0x400 > 388: 0281a025 or s4,s4,at > 38c: 40941001 mtc0 s4,c0_tcstatus > 390: 000000c0 ehb > 394: 8fbf0044 lw ra,68(sp) > 398: 8fbe0040 lw s8,64(sp) > 39c: 8fb7003c lw s7,60(sp) > 3a0: 8fb60038 lw s6,56(sp) > 3a4: 8fb50034 lw s5,52(sp) > 3a8: 8fb40030 lw s4,48(sp) > 3ac: 8fb3002c lw s3,44(sp) > 3b0: 8fb20028 lw s2,40(sp) > 3b4: 8fb10024 lw s1,36(sp) > 3b8: 8fb00020 lw s0,32(sp) > 3bc: 03e00008 jr ra > 3c0: 27bd0048 addiu sp,sp,72 > > > On Sat, Dec 18, 2010 at 3:05 AM, Kevin D. Kissell <kevink@paralogos.com> > wrote: > So, Anoop, if you get a minute for this any time in the next day or so > (after which I'll have very limited net access until next year), could you > please do an <mumble>-mips<mumble>-objdump --disassemble of your kernel > image (or even just the mips-mt.o module) from a failing kernel build and > post the disassembly of mips_mt_regdump()? The confirmation or refutation > of the theory about local_irq_save() no longer being built correctly for > SMTC would be within the first few instructions... > > /K. > > > On 12/16/10 11:58, Kevin D. Kissell wrote: > Ralf tells me that this message got blocked by the LMO server due to HTML > content. > So here it is again, textier. > > On 12/16/10 11:24, Kevin D. Kissell wrote: > On 12/16/10 07:37, STUART VENTERS wrote: > > Two other possible clues: > > The EVP is clear in the MVPControl register. > Does this say that only VPE0, T0 gets to run? > That's correct. In the maxtcs=1/maxvpes=1 boot state, it wouldn't matter. > It's just possible that setting EVP is conditional on more than one VPE > being used, but that's not the way I remember it. > > Also the EXCPT bits in VPEControl for VPE1 indicate a Gating Storage > Exception dispatch. > But that seems to conflict the EVP bit above. > I don't have a copy of the ASE spec handy to see whether those bits have a > defined power-on value, but particularly if maxvpes=1 was set at boot > time, > I would expect VPE1's registers to be in a partly random power-up state. > > Perhaps these are an artifact of getting to a good state to dump things > out. > As per my previous mail, I looked at the MT register dump source, and it > really does pull values directly > out of registers and doesn't depend on having a sane kernel stack frame. > The exceptions to that rule > are the reported values for TCStatus of the executing TC, which is based > on the perhaps-now-broken > assumption that local_irq_save(flags) stores the *entire* pre-invocation > value of the TCStatus register > in the flags variable, and MVPcontrol, which is based on the assumption > that dvpe() returns the pre-invocation > value of MVPcontrol. Break those assumptions, and you'll get inconsistent > state dumps like this, > and very possibly incorrect execution. Particularly if what was done was > that effectively replaces > the SMTC-specific implementation of local_irq_save()/local_irq_restore() > with something that uses > the generic MIPS32R2 atomic interrupt enable/disable instructions. That > would have been a *very* bad idea... > > Regards, > > Kevin K. > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-21 20:29 ` Anoop P.A. (?) @ 2010-12-22 10:27 ` Kevin D. Kissell 2010-12-22 11:35 ` Anoop P A -1 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-22 10:27 UTC (permalink / raw) To: Anoop P.A.; +Cc: Anoop P A, STUART VENTERS, linux-mips > Sorry I misunderstood file. git blame shows that "andi" is around for quite > some time. I've never used git blame, so I don't know how far it can be trusted, but if that change was made in 2006, that would predate the major breakage by several years. So my suggestion from yesterday is a reasonable one: > I think that if you were to tweak mips-mt.c at line 103 to change > the > > tcstatval = flags; /* And pre-dump TCStatus is flags */ > > to something more like > > /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable */ > tcstatval = (read_c0_tcstatus() & ~0x400) | flags; > > should fix the dump. With that patch, if you re-run the experiment of hang-breakout-dump, we might be able to deduce something. Ralf wrote to me independently to say that my message from yesterday with that suggestion and some other commentary got eaten once again by the LMO mail forwarder because of the HTML content. With all due respect, I'm using a very standard open-source mail client (Thunderbird) with a very normal option (reply to text with text, HTML with HTML). Perhaps it it's the LMO mail system that needs to change, and not the mail configurations of the whole LMO community. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-22 10:27 ` Kevin D. Kissell @ 2010-12-22 11:35 ` Anoop P A 2010-12-22 11:37 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-22 11:35 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Anoop P.A., STUART VENTERS, linux-mips On Wed, 2010-12-22 at 02:27 -0800, Kevin D. Kissell wrote: > > Sorry I misunderstood file. git blame shows that "andi" is around for > quite > > some time. > > I've never used git blame, so I don't know how far it can be trusted, > but if that change was made in 2006, that would predate the major > breakage by several > years. So my suggestion from yesterday is a reasonable one: That change is present in booting 2.6.32 kernel.Corresponding patch can be found in gitweb . http://git.linux-mips.org/?p=linux.git;a=commitdiff;h=41c594ab65fc89573af296d192aa5235d09717ab#patch39 > > > I think that if you were to tweak mips-mt.c at line 103 to change > > the > > > > tcstatval = flags; /* And pre-dump TCStatus is flags */ > > > > to something more like > > > > /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable */ > > tcstatval = (read_c0_tcstatus() & ~0x400) | flags; > > > > should fix the dump. > > With that patch, if you re-run the experiment of hang-breakout-dump, we > might be able to deduce something. Here is the dump with the patch. [ 0.000000] Calibrating delay loop... === MIPS MT State Dump === [ 0.000000] -- Global State -- [ 0.000000] MVPControl Passed: 00000000 [ 0.000000] MVPControl Read: 00000000 [ 0.000000] MVPConf0 : a8008406 [ 0.000000] -- per-VPE State -- [ 0.000000] VPE 0 [ 0.000000] VPEControl : 00000000 [ 0.000000] VPEConf0 : 800f0003 [ 0.000000] VPE0.Status : 11004001 [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 [ 0.000000] VPE0.Cause : e080407c [ 0.000000] VPE0.Config7 : 00010000 [ 0.000000] VPE 1 [ 0.000000] VPEControl : 00030000 [ 0.000000] VPEConf0 : 800f0000 [ 0.000000] VPE1.Status : 00407904 [ 0.000000] VPE1.EPC : fffdffff 0xfffdffff [ 0.000000] VPE1.Cause : 4000027c [ 0.000000] VPE1.Config7 : 00010000 [ 0.000000] -- per-TC State -- [ 0.000000] TC 0 (current TC with VPE EPC above) [ 0.000000] TCStatus : 11004001 [ 0.000000] TCBind : 00000000 [ 0.000000] TCRestart : 803fc408 printk+0x10/0x30 [ 0.000000] TCHalt : 00000000 [ 0.000000] TCContext : 00000000 [ 0.000000] TC 1 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00200001 [ 0.000000] TCRestart : 3ffffffe 0x3ffffffe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : efffffff [ 0.000000] TC 2 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00400001 [ 0.000000] TCRestart : ffffffee 0xffffffee [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : efffffbf [ 0.000000] TC 3 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00600001 [ 0.000000] TCRestart : ffe00200 0xffe00200 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 7fffb77f [ 0.000000] TC 4 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00800001 [ 0.000000] TCRestart : ffe00200 0xffe00200 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 7ffdf736 [ 0.000000] TC 5 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00a00001 [ 0.000000] TCRestart : ffe00200 0xffe00200 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : ee5ffff7 [ 0.000000] TC 6 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00c00001 [ 0.000000] TCRestart : f7ff7ffe 0xf7ff7ffe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : e6fffffb [ 0.000000] Counter Interrupts taken per CPU (TC) [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] Self-IPI invocations: [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] 0 Recoveries of "stolen" FPU [ 0.000000] =========================== [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010c9b4 mips_mt_regdump+0x3a4/0x3d4 [ 0.010000] VPE0.Cause : 50804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00030000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00407904 [ 0.010000] VPE1.EPC : fffdffff 0xfffdffff [ 0.010000] VPE1.Cause : 4000027c [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 18004000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803fc408 printk+0x10/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 3ffffffe 0x3ffffffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : efffffff [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : ffffffee 0xffffffee [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : efffffbf [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : ffe00200 0xffe00200 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 7fffb77f [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : ffe00200 0xffe00200 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 7ffdf736 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : ffe00200 0xffe00200 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : ee5ffff7 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : f7ff7ffe 0xf7ff7ffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : e6fffffb [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU [ 0.010000] =========================== ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-22 11:35 ` Anoop P A @ 2010-12-22 11:37 ` Kevin D. Kissell 2010-12-22 11:51 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-22 11:37 UTC (permalink / raw) To: Anoop P A; +Cc: Anoop P.A., STUART VENTERS, linux-mips Thanks. This is indeed strange. The VPE0 Status and TC0 TCStatus/Cause all indicate that interrupts are enabled and not inhibited at the per-TC level, and the presumed timer interrupt, in the 0x4000 bit, is present and not masked-off. Logically, the system must be entering (and exiting) the interrupt handler, yet the timer calibration isn't completing. That leaves more complex possible explanations for failure, most of which would fall into two categories: 1) The platform interrupt handler is failing to decode the event properly as a timer event. 2) Despite there being only one TC active, the calibration code is waiting for some handshake from another "CPU" To test the first, you might consider adding a kprintf() to the case of a "spurious" timer-like interrupt being detected and ignored... Regards, Kevin K. On 12/22/10 3:35 AM, Anoop P A wrote: > On Wed, 2010-12-22 at 02:27 -0800, Kevin D. Kissell wrote: >>> Sorry I misunderstood file. git blame shows that "andi" is around for >> quite >> > some time. >> >> I've never used git blame, so I don't know how far it can be trusted, >> but if that change was made in 2006, that would predate the major >> breakage by several >> years. So my suggestion from yesterday is a reasonable one: > That change is present in booting 2.6.32 kernel.Corresponding patch can > be found in gitweb . > http://git.linux-mips.org/?p=linux.git;a=commitdiff;h=41c594ab65fc89573af296d192aa5235d09717ab#patch39 > >> > I think that if you were to tweak mips-mt.c at line 103 to change >> > the >> > >> > tcstatval = flags; /* And pre-dump TCStatus is flags */ >> > >> > to something more like >> > >> > /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable */ >> > tcstatval = (read_c0_tcstatus()& ~0x400) | flags; >> > >> > should fix the dump. >> >> With that patch, if you re-run the experiment of hang-breakout-dump, we >> might be able to deduce something. > Here is the dump with the patch. > > [ 0.000000] Calibrating delay loop... === MIPS MT State Dump === > [ 0.000000] -- Global State -- > [ 0.000000] MVPControl Passed: 00000000 > [ 0.000000] MVPControl Read: 00000000 > [ 0.000000] MVPConf0 : a8008406 > [ 0.000000] -- per-VPE State -- > [ 0.000000] VPE 0 > [ 0.000000] VPEControl : 00000000 > [ 0.000000] VPEConf0 : 800f0003 > [ 0.000000] VPE0.Status : 11004001 > [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 > [ 0.000000] VPE0.Cause : e080407c > [ 0.000000] VPE0.Config7 : 00010000 > [ 0.000000] VPE 1 > [ 0.000000] VPEControl : 00030000 > [ 0.000000] VPEConf0 : 800f0000 > [ 0.000000] VPE1.Status : 00407904 > [ 0.000000] VPE1.EPC : fffdffff 0xfffdffff > [ 0.000000] VPE1.Cause : 4000027c > [ 0.000000] VPE1.Config7 : 00010000 > [ 0.000000] -- per-TC State -- > [ 0.000000] TC 0 (current TC with VPE EPC above) > [ 0.000000] TCStatus : 11004001 > [ 0.000000] TCBind : 00000000 > [ 0.000000] TCRestart : 803fc408 printk+0x10/0x30 > [ 0.000000] TCHalt : 00000000 > [ 0.000000] TCContext : 00000000 > [ 0.000000] TC 1 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00200001 > [ 0.000000] TCRestart : 3ffffffe 0x3ffffffe > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : efffffff > [ 0.000000] TC 2 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00400001 > [ 0.000000] TCRestart : ffffffee 0xffffffee > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : efffffbf > [ 0.000000] TC 3 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00600001 > [ 0.000000] TCRestart : ffe00200 0xffe00200 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 7fffb77f > [ 0.000000] TC 4 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00800001 > [ 0.000000] TCRestart : ffe00200 0xffe00200 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 7ffdf736 > [ 0.000000] TC 5 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00a00001 > [ 0.000000] TCRestart : ffe00200 0xffe00200 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : ee5ffff7 > [ 0.000000] TC 6 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00c00001 > [ 0.000000] TCRestart : f7ff7ffe 0xf7ff7ffe > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : e6fffffb > [ 0.000000] Counter Interrupts taken per CPU (TC) > [ 0.000000] 0: 0 > [ 0.000000] 1: 0 > [ 0.000000] Self-IPI invocations: > [ 0.000000] 0: 0 > [ 0.000000] 1: 0 > [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] 0 Recoveries of "stolen" FPU > [ 0.000000] =========================== > [ 0.010000] === MIPS MT State Dump === > [ 0.010000] -- Global State -- > [ 0.010000] MVPControl Passed: 00000000 > [ 0.010000] MVPControl Read: 00000000 > [ 0.010000] MVPConf0 : a8008406 > [ 0.010000] -- per-VPE State -- > [ 0.010000] VPE 0 > [ 0.010000] VPEControl : 00000000 > [ 0.010000] VPEConf0 : 800f0003 > [ 0.010000] VPE0.Status : 18004000 > [ 0.010000] VPE0.EPC : 8010c9b4 mips_mt_regdump+0x3a4/0x3d4 > [ 0.010000] VPE0.Cause : 50804000 > [ 0.010000] VPE0.Config7 : 00010000 > [ 0.010000] VPE 1 > [ 0.010000] VPEControl : 00030000 > [ 0.010000] VPEConf0 : 800f0000 > [ 0.010000] VPE1.Status : 00407904 > [ 0.010000] VPE1.EPC : fffdffff 0xfffdffff > [ 0.010000] VPE1.Cause : 4000027c > [ 0.010000] VPE1.Config7 : 00010000 > [ 0.010000] -- per-TC State -- > [ 0.010000] TC 0 (current TC with VPE EPC above) > [ 0.010000] TCStatus : 18004000 > [ 0.010000] TCBind : 00000000 > [ 0.010000] TCRestart : 803fc408 printk+0x10/0x30 > [ 0.010000] TCHalt : 00000000 > [ 0.010000] TCContext : 00000000 > [ 0.010000] TC 1 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00200001 > [ 0.010000] TCRestart : 3ffffffe 0x3ffffffe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : efffffff > [ 0.010000] TC 2 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00400001 > [ 0.010000] TCRestart : ffffffee 0xffffffee > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : efffffbf > [ 0.010000] TC 3 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00600001 > [ 0.010000] TCRestart : ffe00200 0xffe00200 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 7fffb77f > [ 0.010000] TC 4 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00800001 > [ 0.010000] TCRestart : ffe00200 0xffe00200 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 7ffdf736 > [ 0.010000] TC 5 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00a00001 > [ 0.010000] TCRestart : ffe00200 0xffe00200 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : ee5ffff7 > [ 0.010000] TC 6 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00c00001 > [ 0.010000] TCRestart : f7ff7ffe 0xf7ff7ffe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : e6fffffb > [ 0.010000] Counter Interrupts taken per CPU (TC) > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] Self-IPI invocations: > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] 0 Recoveries of "stolen" FPU > [ 0.010000] =========================== > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-22 11:37 ` Kevin D. Kissell @ 2010-12-22 11:51 ` Anoop P A 2010-12-22 13:03 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-22 11:51 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Anoop P.A., STUART VENTERS, linux-mips On Wed, 2010-12-22 at 03:37 -0800, Kevin D. Kissell wrote: > Thanks. This is indeed strange. The VPE0 Status and TC0 TCStatus/Cause > all indicate that interrupts are enabled and not inhibited at the per-TC > level, and the presumed timer interrupt, in the 0x4000 bit, is present > and not masked-off. Logically, the system must be entering (and > exiting) the interrupt handler, yet the timer calibration isn't > completing. That leaves more complex possible explanations for failure, > most of which would fall into two categories: > > 1) The platform interrupt handler is failing to decode the event > properly as a timer event. > 2) Despite there being only one TC active, the calibration code is > waiting for some handshake from another "CPU" > > To test the first, you might consider adding a kprintf() to the case of > a "spurious" timer-like interrupt being detected and ignored... I have tried it . only one interrupt is coming and platform handler detect it as timer interrupt and acknowledges properly . you can see a time stamp change in the logs. > > Regards, > > Kevin K. > > On 12/22/10 3:35 AM, Anoop P A wrote: > > On Wed, 2010-12-22 at 02:27 -0800, Kevin D. Kissell wrote: > >>> Sorry I misunderstood file. git blame shows that "andi" is around for > >> quite > >> > some time. > >> > >> I've never used git blame, so I don't know how far it can be trusted, > >> but if that change was made in 2006, that would predate the major > >> breakage by several > >> years. So my suggestion from yesterday is a reasonable one: > > That change is present in booting 2.6.32 kernel.Corresponding patch can > > be found in gitweb . > > http://git.linux-mips.org/?p=linux.git;a=commitdiff;h=41c594ab65fc89573af296d192aa5235d09717ab#patch39 > > > >> > I think that if you were to tweak mips-mt.c at line 103 to change > >> > the > >> > > >> > tcstatval = flags; /* And pre-dump TCStatus is flags */ > >> > > >> > to something more like > >> > > >> > /* Pre-dump TCStatus Interrupt Inhibit bit is in flags variable */ > >> > tcstatval = (read_c0_tcstatus()& ~0x400) | flags; > >> > > >> > should fix the dump. > >> > >> With that patch, if you re-run the experiment of hang-breakout-dump, we > >> might be able to deduce something. > > Here is the dump with the patch. > > > > [ 0.000000] Calibrating delay loop... === MIPS MT State Dump === > > [ 0.000000] -- Global State -- > > [ 0.000000] MVPControl Passed: 00000000 > > [ 0.000000] MVPControl Read: 00000000 > > [ 0.000000] MVPConf0 : a8008406 > > [ 0.000000] -- per-VPE State -- > > [ 0.000000] VPE 0 > > [ 0.000000] VPEControl : 00000000 > > [ 0.000000] VPEConf0 : 800f0003 > > [ 0.000000] VPE0.Status : 11004001 > > [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 > > [ 0.000000] VPE0.Cause : e080407c > > [ 0.000000] VPE0.Config7 : 00010000 > > [ 0.000000] VPE 1 > > [ 0.000000] VPEControl : 00030000 > > [ 0.000000] VPEConf0 : 800f0000 > > [ 0.000000] VPE1.Status : 00407904 > > [ 0.000000] VPE1.EPC : fffdffff 0xfffdffff > > [ 0.000000] VPE1.Cause : 4000027c > > [ 0.000000] VPE1.Config7 : 00010000 > > [ 0.000000] -- per-TC State -- > > [ 0.000000] TC 0 (current TC with VPE EPC above) > > [ 0.000000] TCStatus : 11004001 > > [ 0.000000] TCBind : 00000000 > > [ 0.000000] TCRestart : 803fc408 printk+0x10/0x30 > > [ 0.000000] TCHalt : 00000000 > > [ 0.000000] TCContext : 00000000 > > [ 0.000000] TC 1 > > [ 0.000000] TCStatus : 00000000 > > [ 0.000000] TCBind : 00200001 > > [ 0.000000] TCRestart : 3ffffffe 0x3ffffffe > > [ 0.000000] TCHalt : 00000001 > > [ 0.000000] TCContext : efffffff > > [ 0.000000] TC 2 > > [ 0.000000] TCStatus : 00000000 > > [ 0.000000] TCBind : 00400001 > > [ 0.000000] TCRestart : ffffffee 0xffffffee > > [ 0.000000] TCHalt : 00000001 > > [ 0.000000] TCContext : efffffbf > > [ 0.000000] TC 3 > > [ 0.000000] TCStatus : 00000000 > > [ 0.000000] TCBind : 00600001 > > [ 0.000000] TCRestart : ffe00200 0xffe00200 > > [ 0.000000] TCHalt : 00000001 > > [ 0.000000] TCContext : 7fffb77f > > [ 0.000000] TC 4 > > [ 0.000000] TCStatus : 00000000 > > [ 0.000000] TCBind : 00800001 > > [ 0.000000] TCRestart : ffe00200 0xffe00200 > > [ 0.000000] TCHalt : 00000001 > > [ 0.000000] TCContext : 7ffdf736 > > [ 0.000000] TC 5 > > [ 0.000000] TCStatus : 00000000 > > [ 0.000000] TCBind : 00a00001 > > [ 0.000000] TCRestart : ffe00200 0xffe00200 > > [ 0.000000] TCHalt : 00000001 > > [ 0.000000] TCContext : ee5ffff7 > > [ 0.000000] TC 6 > > [ 0.000000] TCStatus : 00000000 > > [ 0.000000] TCBind : 00c00001 > > [ 0.000000] TCRestart : f7ff7ffe 0xf7ff7ffe > > [ 0.000000] TCHalt : 00000001 > > [ 0.000000] TCContext : e6fffffb > > [ 0.000000] Counter Interrupts taken per CPU (TC) > > [ 0.000000] 0: 0 > > [ 0.000000] 1: 0 > > [ 0.000000] Self-IPI invocations: > > [ 0.000000] 0: 0 > > [ 0.000000] 1: 0 > > [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > > [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > > [ 0.000000] 0 Recoveries of "stolen" FPU > > [ 0.000000] =========================== > > [ 0.010000] === MIPS MT State Dump === > > [ 0.010000] -- Global State -- > > [ 0.010000] MVPControl Passed: 00000000 > > [ 0.010000] MVPControl Read: 00000000 > > [ 0.010000] MVPConf0 : a8008406 > > [ 0.010000] -- per-VPE State -- > > [ 0.010000] VPE 0 > > [ 0.010000] VPEControl : 00000000 > > [ 0.010000] VPEConf0 : 800f0003 > > [ 0.010000] VPE0.Status : 18004000 > > [ 0.010000] VPE0.EPC : 8010c9b4 mips_mt_regdump+0x3a4/0x3d4 > > [ 0.010000] VPE0.Cause : 50804000 > > [ 0.010000] VPE0.Config7 : 00010000 > > [ 0.010000] VPE 1 > > [ 0.010000] VPEControl : 00030000 > > [ 0.010000] VPEConf0 : 800f0000 > > [ 0.010000] VPE1.Status : 00407904 > > [ 0.010000] VPE1.EPC : fffdffff 0xfffdffff > > [ 0.010000] VPE1.Cause : 4000027c > > [ 0.010000] VPE1.Config7 : 00010000 > > [ 0.010000] -- per-TC State -- > > [ 0.010000] TC 0 (current TC with VPE EPC above) > > [ 0.010000] TCStatus : 18004000 > > [ 0.010000] TCBind : 00000000 > > [ 0.010000] TCRestart : 803fc408 printk+0x10/0x30 > > [ 0.010000] TCHalt : 00000000 > > [ 0.010000] TCContext : 00000000 > > [ 0.010000] TC 1 > > [ 0.010000] TCStatus : 00000000 > > [ 0.010000] TCBind : 00200001 > > [ 0.010000] TCRestart : 3ffffffe 0x3ffffffe > > [ 0.010000] TCHalt : 00000001 > > [ 0.010000] TCContext : efffffff > > [ 0.010000] TC 2 > > [ 0.010000] TCStatus : 00000000 > > [ 0.010000] TCBind : 00400001 > > [ 0.010000] TCRestart : ffffffee 0xffffffee > > [ 0.010000] TCHalt : 00000001 > > [ 0.010000] TCContext : efffffbf > > [ 0.010000] TC 3 > > [ 0.010000] TCStatus : 00000000 > > [ 0.010000] TCBind : 00600001 > > [ 0.010000] TCRestart : ffe00200 0xffe00200 > > [ 0.010000] TCHalt : 00000001 > > [ 0.010000] TCContext : 7fffb77f > > [ 0.010000] TC 4 > > [ 0.010000] TCStatus : 00000000 > > [ 0.010000] TCBind : 00800001 > > [ 0.010000] TCRestart : ffe00200 0xffe00200 > > [ 0.010000] TCHalt : 00000001 > > [ 0.010000] TCContext : 7ffdf736 > > [ 0.010000] TC 5 > > [ 0.010000] TCStatus : 00000000 > > [ 0.010000] TCBind : 00a00001 > > [ 0.010000] TCRestart : ffe00200 0xffe00200 > > [ 0.010000] TCHalt : 00000001 > > [ 0.010000] TCContext : ee5ffff7 > > [ 0.010000] TC 6 > > [ 0.010000] TCStatus : 00000000 > > [ 0.010000] TCBind : 00c00001 > > [ 0.010000] TCRestart : f7ff7ffe 0xf7ff7ffe > > [ 0.010000] TCHalt : 00000001 > > [ 0.010000] TCContext : e6fffffb > > [ 0.010000] Counter Interrupts taken per CPU (TC) > > [ 0.010000] 0: 0 > > [ 0.010000] 1: 0 > > [ 0.010000] Self-IPI invocations: > > [ 0.010000] 0: 0 > > [ 0.010000] 1: 0 > > [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > > [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > > [ 0.010000] 0 Recoveries of "stolen" FPU > > [ 0.010000] =========================== > > > > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-22 11:51 ` Anoop P A @ 2010-12-22 13:03 ` Kevin D. Kissell 2010-12-22 16:34 ` STUART VENTERS 2010-12-23 21:09 ` STUART VENTERS 0 siblings, 2 replies; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-22 13:03 UTC (permalink / raw) To: Anoop P A; +Cc: Anoop P.A., STUART VENTERS, linux-mips On 12/22/10 3:51 AM, Anoop P A wrote: > On Wed, 2010-12-22 at 03:37 -0800, Kevin D. Kissell wrote: >> Thanks. This is indeed strange. The VPE0 Status and TC0 TCStatus/Cause >> all indicate that interrupts are enabled and not inhibited at the per-TC >> level, and the presumed timer interrupt, in the 0x4000 bit, is present >> and not masked-off. Logically, the system must be entering (and >> exiting) the interrupt handler, yet the timer calibration isn't >> completing. That leaves more complex possible explanations for failure, >> most of which would fall into two categories: >> >> 1) The platform interrupt handler is failing to decode the event >> properly as a timer event. >> 2) Despite there being only one TC active, the calibration code is >> waiting for some handshake from another "CPU" >> >> To test the first, you might consider adding a kprintf() to the case of >> a "spurious" timer-like interrupt being detected and ignored... > I have tried it . only one interrupt is coming and platform handler > detect it as timer interrupt and acknowledges properly . you can see a > time stamp change in the logs. That's really strange. And your timer interrupt is definitely on the interrupt that corresponds to the 0x4000 mask? I may have written the MT spec and the original SMTC code, but I don't have a copy of the spec, and it's been a few years, and I can't interpret the MVP and VPE control/config values. But I just don't see how the processor could not be taking more interrupts. Stuart did decode the global/VPE state enough to observe that global multithreaded execution wasn't enabled, which is indeed strange - it shouldn't matter for single-TC execution, but I don't recall there being any special-case in the SMTC initialization that bypassed that enable. That makes me suspect that maybe someone changed the initialization sequence in a way that bypasses one of the canonical initialization steps in a way that would break SMTC, but I don't know why that would result in the interrupt behavior you observe. It might be yet another blind alley, but could you add/arm diagnostic output for each of the initialization functions in smtc.c? Ah, yes, and one other thing. You should add a dump of ErrorEPC to the MT register dump. I did it for myself once upon a time when I was confronted with a similar mystery, but never filed a patch. If you're breaking in with NMI, that could help identify more precisely where it's locking up. You really ought to try to borrow an EJTAG probe. It would save us both a lot of time. And my time to trouble-shoot this with you is limited. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-22 16:34 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-22 16:34 UTC (permalink / raw) To: Anoop P A; +Cc: Anoop P.A., linux-mips, Kevin D. Kissell Anoop, Nothing jumps out to me in the new set of register values. It might be worth dumping all the CP0 registers? I'm especially interested in the Config3 to see the VEIC bit. The timer registers might be useful as well. Regards, Stuart -----Original Message----- From: Kevin D. Kissell [mailto:kevink@paralogos.com] Sent: Wednesday, December 22, 2010 7:03 AM To: Anoop P A Cc: Anoop P.A.; STUART VENTERS; linux-mips@linux-mips.org Subject: Re: SMTC support status in latest git head. On 12/22/10 3:51 AM, Anoop P A wrote: > On Wed, 2010-12-22 at 03:37 -0800, Kevin D. Kissell wrote: >> Thanks. This is indeed strange. The VPE0 Status and TC0 TCStatus/Cause >> all indicate that interrupts are enabled and not inhibited at the per-TC >> level, and the presumed timer interrupt, in the 0x4000 bit, is present >> and not masked-off. Logically, the system must be entering (and >> exiting) the interrupt handler, yet the timer calibration isn't >> completing. That leaves more complex possible explanations for failure, >> most of which would fall into two categories: >> >> 1) The platform interrupt handler is failing to decode the event >> properly as a timer event. >> 2) Despite there being only one TC active, the calibration code is >> waiting for some handshake from another "CPU" >> >> To test the first, you might consider adding a kprintf() to the case of >> a "spurious" timer-like interrupt being detected and ignored... > I have tried it . only one interrupt is coming and platform handler > detect it as timer interrupt and acknowledges properly . you can see a > time stamp change in the logs. That's really strange. And your timer interrupt is definitely on the interrupt that corresponds to the 0x4000 mask? I may have written the MT spec and the original SMTC code, but I don't have a copy of the spec, and it's been a few years, and I can't interpret the MVP and VPE control/config values. But I just don't see how the processor could not be taking more interrupts. Stuart did decode the global/VPE state enough to observe that global multithreaded execution wasn't enabled, which is indeed strange - it shouldn't matter for single-TC execution, but I don't recall there being any special-case in the SMTC initialization that bypassed that enable. That makes me suspect that maybe someone changed the initialization sequence in a way that bypasses one of the canonical initialization steps in a way that would break SMTC, but I don't know why that would result in the interrupt behavior you observe. It might be yet another blind alley, but could you add/arm diagnostic output for each of the initialization functions in smtc.c? Ah, yes, and one other thing. You should add a dump of ErrorEPC to the MT register dump. I did it for myself once upon a time when I was confronted with a similar mystery, but never filed a patch. If you're breaking in with NMI, that could help identify more precisely where it's locking up. You really ought to try to borrow an EJTAG probe. It would save us both a lot of time. And my time to trouble-shoot this with you is limited. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-22 16:34 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-22 16:34 UTC (permalink / raw) To: Anoop P A; +Cc: Anoop P.A., linux-mips, Kevin D. Kissell Anoop, Nothing jumps out to me in the new set of register values. It might be worth dumping all the CP0 registers? I'm especially interested in the Config3 to see the VEIC bit. The timer registers might be useful as well. Regards, Stuart -----Original Message----- From: Kevin D. Kissell [mailto:kevink@paralogos.com] Sent: Wednesday, December 22, 2010 7:03 AM To: Anoop P A Cc: Anoop P.A.; STUART VENTERS; linux-mips@linux-mips.org Subject: Re: SMTC support status in latest git head. On 12/22/10 3:51 AM, Anoop P A wrote: > On Wed, 2010-12-22 at 03:37 -0800, Kevin D. Kissell wrote: >> Thanks. This is indeed strange. The VPE0 Status and TC0 TCStatus/Cause >> all indicate that interrupts are enabled and not inhibited at the per-TC >> level, and the presumed timer interrupt, in the 0x4000 bit, is present >> and not masked-off. Logically, the system must be entering (and >> exiting) the interrupt handler, yet the timer calibration isn't >> completing. That leaves more complex possible explanations for failure, >> most of which would fall into two categories: >> >> 1) The platform interrupt handler is failing to decode the event >> properly as a timer event. >> 2) Despite there being only one TC active, the calibration code is >> waiting for some handshake from another "CPU" >> >> To test the first, you might consider adding a kprintf() to the case of >> a "spurious" timer-like interrupt being detected and ignored... > I have tried it . only one interrupt is coming and platform handler > detect it as timer interrupt and acknowledges properly . you can see a > time stamp change in the logs. That's really strange. And your timer interrupt is definitely on the interrupt that corresponds to the 0x4000 mask? I may have written the MT spec and the original SMTC code, but I don't have a copy of the spec, and it's been a few years, and I can't interpret the MVP and VPE control/config values. But I just don't see how the processor could not be taking more interrupts. Stuart did decode the global/VPE state enough to observe that global multithreaded execution wasn't enabled, which is indeed strange - it shouldn't matter for single-TC execution, but I don't recall there being any special-case in the SMTC initialization that bypassed that enable. That makes me suspect that maybe someone changed the initialization sequence in a way that bypasses one of the canonical initialization steps in a way that would break SMTC, but I don't know why that would result in the interrupt behavior you observe. It might be yet another blind alley, but could you add/arm diagnostic output for each of the initialization functions in smtc.c? Ah, yes, and one other thing. You should add a dump of ErrorEPC to the MT register dump. I did it for myself once upon a time when I was confronted with a similar mystery, but never filed a patch. If you're breaking in with NMI, that could help identify more precisely where it's locking up. You really ought to try to borrow an EJTAG probe. It would save us both a lot of time. And my time to trouble-shoot this with you is limited. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-23 21:09 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-23 21:09 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Anoop P.A., linux-mips, Anoop P A [-- Attachment #1: Type: text/plain, Size: 804 bytes --] Kevin, I'm not sure if it's useful, but finally I got the time to look at the two kernel versions Anoop pointed out. works 2.6.32-stable with patch 804 works_not 2.6.33-stable greping for files with CONFIG_MIPS_MT_SMTC and looking for timer interrupt related stuff found the following differences: arch/mips/include/asm/irq.h arch/mips/kernel/irq.c do_IRQ arch/mips/include/asm/stackframe.h SAVE_SOME SAVE_TEMP get/set_saved_sp arch/mips/include/asm/time.h clocksource_set_clock arch/mips/kernel/process.c cpu_idle arch/mips/kernel/smtc.c __irq_entry ipi_decode SMTC_CLOCK_TICK Enclosed are the two subsets of files for a more expert look. I'll try to look in more detail after Christmas. Cheers, Stuart [-- Attachment #2: foo.tar.gz --] [-- Type: application/x-gzip, Size: 46685 bytes --] ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-23 21:09 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-23 21:09 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Anoop P.A., linux-mips, Anoop P A [-- Attachment #1: Type: text/plain, Size: 804 bytes --] Kevin, I'm not sure if it's useful, but finally I got the time to look at the two kernel versions Anoop pointed out. works 2.6.32-stable with patch 804 works_not 2.6.33-stable greping for files with CONFIG_MIPS_MT_SMTC and looking for timer interrupt related stuff found the following differences: arch/mips/include/asm/irq.h arch/mips/kernel/irq.c do_IRQ arch/mips/include/asm/stackframe.h SAVE_SOME SAVE_TEMP get/set_saved_sp arch/mips/include/asm/time.h clocksource_set_clock arch/mips/kernel/process.c cpu_idle arch/mips/kernel/smtc.c __irq_entry ipi_decode SMTC_CLOCK_TICK Enclosed are the two subsets of files for a more expert look. I'll try to look in more detail after Christmas. Cheers, Stuart [-- Attachment #2: foo.tar.gz --] [-- Type: application/x-gzip, Size: 46685 bytes --] ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-23 21:09 ` STUART VENTERS (?) @ 2010-12-24 12:32 ` Kevin D. Kissell 2010-12-24 14:39 ` Anoop P A -1 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-24 12:32 UTC (permalink / raw) To: STUART VENTERS; +Cc: Anoop P.A., linux-mips, Anoop P A Thank you, Stuart! I've spotted some definite breakage to SMTC between those versions. In arch/mips/include/asm/stackframe.h, someone moved the store of the Status register value in SAVE_SOME (line 169 or 204, depending on the version) from two instructions after the mfc0 to a point after the #ifdef for SMTC, presumably to get better pipelining of the register access. Unfortunately, the v1 register is also used in the SMTC-specific fragment to save TCStatus, so the Status value gets clobbered before it gets stored. This will eventually result in the Status register getting a TCStatus value, which has some bits on common, but isn't identical and sooner or later Bad Things will happen. I'm a little surprised this wasn't caught by visual inspection of the patch. Possible solutions would include reverting the store of the CP0_STATUS value to the block above the #ifdef, or, to retain whatever performance advantage was obtained by moving the store downward, to use v0/$2 instead of v1/$3, as the staging register for the TCStatus value. I'd lean toward the second option, but I'm not in a position to test and submit a patch just now. Regards, Kevin K. On 12/23/10 1:09 PM, STUART VENTERS wrote: > Kevin, > > I'm not sure if it's useful, > but finally I got the time to look at the two kernel versions Anoop pointed out. > works 2.6.32-stable with patch 804 > works_not 2.6.33-stable > > greping for files with CONFIG_MIPS_MT_SMTC > and looking for timer interrupt related stuff found the following differences: > > > arch/mips/include/asm/irq.h > arch/mips/kernel/irq.c > do_IRQ > > arch/mips/include/asm/stackframe.h > SAVE_SOME SAVE_TEMP get/set_saved_sp > > arch/mips/include/asm/time.h > clocksource_set_clock > > arch/mips/kernel/process.c > cpu_idle > > arch/mips/kernel/smtc.c > __irq_entry > ipi_decode > SMTC_CLOCK_TICK > > > Enclosed are the two subsets of files for a more expert look. > > I'll try to look in more detail after Christmas. > > > Cheers, > > Stuart > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-24 12:32 ` Kevin D. Kissell @ 2010-12-24 14:39 ` Anoop P A 2010-12-24 14:53 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-24 14:39 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips Hi Kevin, Stuart , Woohooo You guys spotted !. http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be the culprit Once I restored previous version of stackframe.h 2.6.33-stable started booting !. Thanks, Anoop On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > Thank you, Stuart! I've spotted some definite breakage to SMTC between > those versions. In arch/mips/include/asm/stackframe.h, someone moved > the store of the Status register value in SAVE_SOME (line 169 or 204, > depending on the version) from two instructions after the mfc0 to a > point after the #ifdef for SMTC, presumably to get better pipelining of > the register access. Unfortunately, the v1 register is also used in the > SMTC-specific fragment to save TCStatus, so the Status value gets > clobbered before it gets stored. This will eventually result in the > Status register getting a TCStatus value, which has some bits on common, > but isn't identical and sooner or later Bad Things will happen. > > I'm a little surprised this wasn't caught by visual inspection of the patch. > > Possible solutions would include reverting the store of the CP0_STATUS > value to the block above the #ifdef, or, to retain whatever performance > advantage was obtained by moving the store downward, to use v0/$2 > instead of v1/$3, as the staging register for the TCStatus value. I'd > lean toward the second option, but I'm not in a position to test and > submit a patch just now. > > Regards, > > Kevin K. > > On 12/23/10 1:09 PM, STUART VENTERS wrote: > > Kevin, > > > > I'm not sure if it's useful, > > but finally I got the time to look at the two kernel versions Anoop pointed out. > > works 2.6.32-stable with patch 804 > > works_not 2.6.33-stable > > > > greping for files with CONFIG_MIPS_MT_SMTC > > and looking for timer interrupt related stuff found the following differences: > > > > > > arch/mips/include/asm/irq.h > > arch/mips/kernel/irq.c > > do_IRQ > > > > arch/mips/include/asm/stackframe.h > > SAVE_SOME SAVE_TEMP get/set_saved_sp > > > > arch/mips/include/asm/time.h > > clocksource_set_clock > > > > arch/mips/kernel/process.c > > cpu_idle > > > > arch/mips/kernel/smtc.c > > __irq_entry > > ipi_decode > > SMTC_CLOCK_TICK > > > > > > Enclosed are the two subsets of files for a more expert look. > > > > I'll try to look in more detail after Christmas. > > > > > > Cheers, > > > > Stuart > > > > > > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-24 14:39 ` Anoop P A @ 2010-12-24 14:53 ` Kevin D. Kissell 2010-12-24 16:02 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-24 14:53 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips [-- Attachment #1: Type: text/plain, Size: 2748 bytes --] Excellent! Now, does the attached patch (relative to 2.6.37.11) also fix things, while preserving the other fixes and performance enhancements? /K. On 12/24/10 6:39 AM, Anoop P A wrote: > Hi Kevin, Stuart , > > Woohooo You guys spotted !. > > http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > the culprit > > Once I restored previous version of stackframe.h 2.6.33-stable started > booting !. > > Thanks, > Anoop > > On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >> Thank you, Stuart! I've spotted some definite breakage to SMTC between >> those versions. In arch/mips/include/asm/stackframe.h, someone moved >> the store of the Status register value in SAVE_SOME (line 169 or 204, >> depending on the version) from two instructions after the mfc0 to a >> point after the #ifdef for SMTC, presumably to get better pipelining of >> the register access. Unfortunately, the v1 register is also used in the >> SMTC-specific fragment to save TCStatus, so the Status value gets >> clobbered before it gets stored. This will eventually result in the >> Status register getting a TCStatus value, which has some bits on common, >> but isn't identical and sooner or later Bad Things will happen. >> >> I'm a little surprised this wasn't caught by visual inspection of the patch. >> >> Possible solutions would include reverting the store of the CP0_STATUS >> value to the block above the #ifdef, or, to retain whatever performance >> advantage was obtained by moving the store downward, to use v0/$2 >> instead of v1/$3, as the staging register for the TCStatus value. I'd >> lean toward the second option, but I'm not in a position to test and >> submit a patch just now. >> >> Regards, >> >> Kevin K. >> >> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>> Kevin, >>> >>> I'm not sure if it's useful, >>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>> works 2.6.32-stable with patch 804 >>> works_not 2.6.33-stable >>> >>> greping for files with CONFIG_MIPS_MT_SMTC >>> and looking for timer interrupt related stuff found the following differences: >>> >>> >>> arch/mips/include/asm/irq.h >>> arch/mips/kernel/irq.c >>> do_IRQ >>> >>> arch/mips/include/asm/stackframe.h >>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>> >>> arch/mips/include/asm/time.h >>> clocksource_set_clock >>> >>> arch/mips/kernel/process.c >>> cpu_idle >>> >>> arch/mips/kernel/smtc.c >>> __irq_entry >>> ipi_decode >>> SMTC_CLOCK_TICK >>> >>> >>> Enclosed are the two subsets of files for a more expert look. >>> >>> I'll try to look in more detail after Christmas. >>> >>> >>> Cheers, >>> >>> Stuart >>> >>> >>> >>> > [-- Attachment #2: smtc_stackframe.h.patch --] [-- Type: text/plain, Size: 394 bytes --] --- stackframe.h 2010-12-24 06:47:06.000000000 -0800 +++ stackframe.h.test 2010-12-24 06:48:56.000000000 -0800 @@ -195,9 +195,9 @@ * to cover the pipeline delay. */ .set mips32 - mfc0 v1, CP0_TCSTATUS + mfc0 v0, CP0_TCSTATUS .set mips0 - LONG_S v1, PT_TCSTATUS(sp) + LONG_S v0, PT_TCSTATUS(sp) #endif /* CONFIG_MIPS_MT_SMTC */ LONG_S $4, PT_R4(sp) LONG_S $5, PT_R5(sp) ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-24 14:53 ` Kevin D. Kissell @ 2010-12-24 16:02 ` Anoop P A 2010-12-24 23:34 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-24 16:02 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > Excellent! Now, does the attached patch (relative to 2.6.37.11) also > fix things, while preserving the other fixes and performance enhancements? > I have tested that patch with 2.6.37 branch it well passes calibration loop but hangs after switching to mips closource TC 6 going on-line as CPU 6 Brought up 7 CPUs bio: create slab <bio-0> at 0 SCSI subsystem initialized Switching to clocksource MIPS I Presume this is a different issue as restoring older file didn't help much to get rid of this hang. diff --git a/arch/mips/include/asm/stackframe.h b/arch/mips/include/asm/stackframe.h index 58730c5..7fc9f10 100644 --- a/arch/mips/include/asm/stackframe.h +++ b/arch/mips/include/asm/stackframe.h @@ -195,9 +195,9 @@ * to cover the pipeline delay. */ .set mips32 - mfc0 v1, CP0_TCSTATUS + mfc0 v0, CP0_TCSTATUS .set mips0 - LONG_S v1, PT_TCSTATUS(sp) + LONG_S v0, PT_TCSTATUS(sp) #endif /* CONFIG_MIPS_MT_SMTC */ LONG_S $4, PT_R4(sp) LONG_S $5, PT_R5(sp) > /K. > > On 12/24/10 6:39 AM, Anoop P A wrote: > > Hi Kevin, Stuart , > > > > Woohooo You guys spotted !. > > > > http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > > the culprit > > > > Once I restored previous version of stackframe.h 2.6.33-stable started > > booting !. > > > > Thanks, > > Anoop > > > > On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >> the store of the Status register value in SAVE_SOME (line 169 or 204, > >> depending on the version) from two instructions after the mfc0 to a > >> point after the #ifdef for SMTC, presumably to get better pipelining of > >> the register access. Unfortunately, the v1 register is also used in the > >> SMTC-specific fragment to save TCStatus, so the Status value gets > >> clobbered before it gets stored. This will eventually result in the > >> Status register getting a TCStatus value, which has some bits on common, > >> but isn't identical and sooner or later Bad Things will happen. > >> > >> I'm a little surprised this wasn't caught by visual inspection of the patch. > >> > >> Possible solutions would include reverting the store of the CP0_STATUS > >> value to the block above the #ifdef, or, to retain whatever performance > >> advantage was obtained by moving the store downward, to use v0/$2 > >> instead of v1/$3, as the staging register for the TCStatus value. I'd > >> lean toward the second option, but I'm not in a position to test and > >> submit a patch just now. > >> > >> Regards, > >> > >> Kevin K. > >> > >> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>> Kevin, > >>> > >>> I'm not sure if it's useful, > >>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>> works 2.6.32-stable with patch 804 > >>> works_not 2.6.33-stable > >>> > >>> greping for files with CONFIG_MIPS_MT_SMTC > >>> and looking for timer interrupt related stuff found the following differences: > >>> > >>> > >>> arch/mips/include/asm/irq.h > >>> arch/mips/kernel/irq.c > >>> do_IRQ > >>> > >>> arch/mips/include/asm/stackframe.h > >>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>> > >>> arch/mips/include/asm/time.h > >>> clocksource_set_clock > >>> > >>> arch/mips/kernel/process.c > >>> cpu_idle > >>> > >>> arch/mips/kernel/smtc.c > >>> __irq_entry > >>> ipi_decode > >>> SMTC_CLOCK_TICK > >>> > >>> > >>> Enclosed are the two subsets of files for a more expert look. > >>> > >>> I'll try to look in more detail after Christmas. > >>> > >>> > >>> Cheers, > >>> > >>> Stuart > >>> > >>> > >>> > >>> > > > ^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-24 16:02 ` Anoop P A @ 2010-12-24 23:34 ` Kevin D. Kissell 2010-12-25 7:32 ` Anoop P A 2010-12-27 15:49 ` STUART VENTERS 0 siblings, 2 replies; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-24 23:34 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips Ah, well, at least we have a stackframe.h fix that preserves David's performance tweak for the deeper pipelined processors. In looking for this, I did notice that someone did some modification to the SMTC clock tick logic that I was skeptical had ever been tested. If you've still got that kernel binary handy, you might check to see if it boots with maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. Oh, yes, and Merry Christmas one and all! Regards, Kevin K. On 12/24/10 8:02 AM, Anoop P A wrote: > On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >> fix things, while preserving the other fixes and performance enhancements? >> > I have tested that patch with 2.6.37 branch it well passes calibration > loop but hangs after switching to mips closource > > TC 6 going on-line as CPU 6 > Brought up 7 CPUs > bio: create slab<bio-0> at 0 > SCSI subsystem initialized > Switching to clocksource MIPS > > I Presume this is a different issue as restoring older file didn't help > much to get rid of this hang. > > diff --git a/arch/mips/include/asm/stackframe.h > b/arch/mips/include/asm/stackframe.h > index 58730c5..7fc9f10 100644 > --- a/arch/mips/include/asm/stackframe.h > +++ b/arch/mips/include/asm/stackframe.h > @@ -195,9 +195,9 @@ > * to cover the pipeline delay. > */ > .set mips32 > - mfc0 v1, CP0_TCSTATUS > + mfc0 v0, CP0_TCSTATUS > .set mips0 > - LONG_S v1, PT_TCSTATUS(sp) > + LONG_S v0, PT_TCSTATUS(sp) > #endif /* CONFIG_MIPS_MT_SMTC */ > LONG_S $4, PT_R4(sp) > LONG_S $5, PT_R5(sp) > > >> /K. >> >> On 12/24/10 6:39 AM, Anoop P A wrote: >>> Hi Kevin, Stuart , >>> >>> Woohooo You guys spotted !. >>> >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>> the culprit >>> >>> Once I restored previous version of stackframe.h 2.6.33-stable started >>> booting !. >>> >>> Thanks, >>> Anoop >>> >>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>> depending on the version) from two instructions after the mfc0 to a >>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>> the register access. Unfortunately, the v1 register is also used in the >>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>> clobbered before it gets stored. This will eventually result in the >>>> Status register getting a TCStatus value, which has some bits on common, >>>> but isn't identical and sooner or later Bad Things will happen. >>>> >>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>> >>>> Possible solutions would include reverting the store of the CP0_STATUS >>>> value to the block above the #ifdef, or, to retain whatever performance >>>> advantage was obtained by moving the store downward, to use v0/$2 >>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>> lean toward the second option, but I'm not in a position to test and >>>> submit a patch just now. >>>> >>>> Regards, >>>> >>>> Kevin K. >>>> >>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>> Kevin, >>>>> >>>>> I'm not sure if it's useful, >>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>> works 2.6.32-stable with patch 804 >>>>> works_not 2.6.33-stable >>>>> >>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>> and looking for timer interrupt related stuff found the following differences: >>>>> >>>>> >>>>> arch/mips/include/asm/irq.h >>>>> arch/mips/kernel/irq.c >>>>> do_IRQ >>>>> >>>>> arch/mips/include/asm/stackframe.h >>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>> >>>>> arch/mips/include/asm/time.h >>>>> clocksource_set_clock >>>>> >>>>> arch/mips/kernel/process.c >>>>> cpu_idle >>>>> >>>>> arch/mips/kernel/smtc.c >>>>> __irq_entry >>>>> ipi_decode >>>>> SMTC_CLOCK_TICK >>>>> >>>>> >>>>> Enclosed are the two subsets of files for a more expert look. >>>>> >>>>> I'll try to look in more detail after Christmas. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Stuart >>>>> >>>>> >>>>> >>>>> > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-24 23:34 ` Kevin D. Kissell @ 2010-12-25 7:32 ` Anoop P A 2010-12-25 15:17 ` Kevin D. Kissell 2010-12-27 15:49 ` STUART VENTERS 1 sibling, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-25 7:32 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Fri, 2010-12-24 at 15:34 -0800, Kevin D. Kissell wrote: > Ah, well, at least we have a stackframe.h fix that preserves David's > performance tweak for the deeper pipelined processors. In looking for > this, I did notice that someone did some modification to the SMTC clock > tick logic that I was skeptical had ever been tested. If you've still > got that kernel binary handy, you might check to see if it boots with > maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. Yes I have tried with various combinations of tcs and vpes. with maxvpes=1 I can boot with a max of 4 TCS ( VPE0 has 4 TCs) . However setting maxpes=2 and maxtcs=2 hangs pretty early. Clock rate set to 600000000 console [ttyS0] enabled Calibrating delay loop... 398.33 BogoMIPS (lpj=796672) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 512 Limit of 2 VPEs set Limit of 2 TCs set TLB of 64 entry pairs shared by 2 VPEs VPE 0: TC 0, VPE 1: TC 1 IPI buffer pool of 32 buffers CPU revision is: 00019548 ((null)) TC 1 going on-line as CPU 1 Brought up 2 CPUs One strange observation is with maxtcs=3 and maxvpes=2 kernel boots all the way. Again with maxtcs=5 and maxvpes=2 it hangs after switching to MIPS clocksource. I strongly suspect some issue with locking. I will dig the code early next week. > > Oh, yes, and Merry Christmas one and all! Thank you ! .. Everybody Happy Christmas. > > Regards, > > Kevin K. > > On 12/24/10 8:02 AM, Anoop P A wrote: > > On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >> fix things, while preserving the other fixes and performance enhancements? > >> > > I have tested that patch with 2.6.37 branch it well passes calibration > > loop but hangs after switching to mips closource > > > > TC 6 going on-line as CPU 6 > > Brought up 7 CPUs > > bio: create slab<bio-0> at 0 > > SCSI subsystem initialized > > Switching to clocksource MIPS > > > > I Presume this is a different issue as restoring older file didn't help > > much to get rid of this hang. > > > > diff --git a/arch/mips/include/asm/stackframe.h > > b/arch/mips/include/asm/stackframe.h > > index 58730c5..7fc9f10 100644 > > --- a/arch/mips/include/asm/stackframe.h > > +++ b/arch/mips/include/asm/stackframe.h > > @@ -195,9 +195,9 @@ > > * to cover the pipeline delay. > > */ > > .set mips32 > > - mfc0 v1, CP0_TCSTATUS > > + mfc0 v0, CP0_TCSTATUS > > .set mips0 > > - LONG_S v1, PT_TCSTATUS(sp) > > + LONG_S v0, PT_TCSTATUS(sp) > > #endif /* CONFIG_MIPS_MT_SMTC */ > > LONG_S $4, PT_R4(sp) > > LONG_S $5, PT_R5(sp) > > > > > >> /K. > >> > >> On 12/24/10 6:39 AM, Anoop P A wrote: > >>> Hi Kevin, Stuart , > >>> > >>> Woohooo You guys spotted !. > >>> > >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>> the culprit > >>> > >>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>> booting !. > >>> > >>> Thanks, > >>> Anoop > >>> > >>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>> depending on the version) from two instructions after the mfc0 to a > >>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>> the register access. Unfortunately, the v1 register is also used in the > >>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>> clobbered before it gets stored. This will eventually result in the > >>>> Status register getting a TCStatus value, which has some bits on common, > >>>> but isn't identical and sooner or later Bad Things will happen. > >>>> > >>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>> > >>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>> value to the block above the #ifdef, or, to retain whatever performance > >>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>> lean toward the second option, but I'm not in a position to test and > >>>> submit a patch just now. > >>>> > >>>> Regards, > >>>> > >>>> Kevin K. > >>>> > >>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>> Kevin, > >>>>> > >>>>> I'm not sure if it's useful, > >>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>> works 2.6.32-stable with patch 804 > >>>>> works_not 2.6.33-stable > >>>>> > >>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>> and looking for timer interrupt related stuff found the following differences: > >>>>> > >>>>> > >>>>> arch/mips/include/asm/irq.h > >>>>> arch/mips/kernel/irq.c > >>>>> do_IRQ > >>>>> > >>>>> arch/mips/include/asm/stackframe.h > >>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>> > >>>>> arch/mips/include/asm/time.h > >>>>> clocksource_set_clock > >>>>> > >>>>> arch/mips/kernel/process.c > >>>>> cpu_idle > >>>>> > >>>>> arch/mips/kernel/smtc.c > >>>>> __irq_entry > >>>>> ipi_decode > >>>>> SMTC_CLOCK_TICK > >>>>> > >>>>> > >>>>> Enclosed are the two subsets of files for a more expert look. > >>>>> > >>>>> I'll try to look in more detail after Christmas. > >>>>> > >>>>> > >>>>> Cheers, > >>>>> > >>>>> Stuart > >>>>> > >>>>> > >>>>> > >>>>> > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-25 7:32 ` Anoop P A @ 2010-12-25 15:17 ` Kevin D. Kissell 0 siblings, 0 replies; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-25 15:17 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips On 12/24/10 11:32 PM, Anoop P A wrote: > On Fri, 2010-12-24 at 15:34 -0800, Kevin D. Kissell wrote: >> Ah, well, at least we have a stackframe.h fix that preserves David's >> performance tweak for the deeper pipelined processors. In looking for >> this, I did notice that someone did some modification to the SMTC clock >> tick logic that I was skeptical had ever been tested. If you've still >> got that kernel binary handy, you might check to see if it boots with >> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > Yes I have tried with various combinations of tcs and vpes. with > maxvpes=1 I can boot with a max of 4 TCS ( VPE0 has 4 TCs) . > However setting maxpes=2 and maxtcs=2 hangs pretty early. > > Clock rate set to 600000000 > console [ttyS0] enabled > Calibrating delay loop... 398.33 BogoMIPS (lpj=796672) > pid_max: default: 32768 minimum: 301 > Mount-cache hash table entries: 512 > Limit of 2 VPEs set > Limit of 2 TCs set > TLB of 64 entry pairs shared by 2 VPEs > VPE 0: TC 0, VPE 1: TC 1 > IPI buffer pool of 32 buffers > CPU revision is: 00019548 ((null)) > TC 1 going on-line as CPU 1 > Brought up 2 CPUs > > One strange observation is with maxtcs=3 and maxvpes=2 kernel boots all > the way. > > Again with maxtcs=5 and maxvpes=2 it hangs after switching to MIPS > clocksource. > > I strongly suspect some issue with locking. I will dig the code early > next week. If locking is screwed up, I'd expect more problems with 4 TC "CPUs" in the same VPE. It also suggests that the basic distribution via local low-latency IPI within a VPE is functioning, but that something is broken in the cross-VPE evengt propagation. I strongly suspect that your maxtcs=3, maxvpes=2 case would hang sooner or later, but by luck of the draw none of the init threads got scheduled on VPE 1 long enough to get stuck. I note that there were some changes made under the rubric "MIPS: SMTC: Avoid queueing multiple reschedule IPIs" in October and November of last year that make me nervous. I wouldn't have coded things that way myself, but they might be OK. Still, the first bisection I'd make if I was trouble-shooting this would be to roll back to just before they went in. Ho, ho, ho, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-27 15:49 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-27 15:49 UTC (permalink / raw) To: Kevin D. Kissell, Anoop P A; +Cc: Anoop P.A., linux-mips Kevin, Outstanding, sometimes it's better to be lucky than good. Anoop, Maybe we can get lucky again. If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, I'll be happy to do another diff. Hope you'll have had a good Christmas as well. We've had snow in Alabama since Christmas eve! Regards, Stuart -----Original Message----- From: Kevin D. Kissell [mailto:kevink@paralogos.com] Sent: Friday, December 24, 2010 5:34 PM To: Anoop P A Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org Subject: Re: SMTC support status in latest git head. Ah, well, at least we have a stackframe.h fix that preserves David's performance tweak for the deeper pipelined processors. In looking for this, I did notice that someone did some modification to the SMTC clock tick logic that I was skeptical had ever been tested. If you've still got that kernel binary handy, you might check to see if it boots with maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. Oh, yes, and Merry Christmas one and all! Regards, Kevin K. On 12/24/10 8:02 AM, Anoop P A wrote: > On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >> fix things, while preserving the other fixes and performance enhancements? >> > I have tested that patch with 2.6.37 branch it well passes calibration > loop but hangs after switching to mips closource > > TC 6 going on-line as CPU 6 > Brought up 7 CPUs > bio: create slab<bio-0> at 0 > SCSI subsystem initialized > Switching to clocksource MIPS > > I Presume this is a different issue as restoring older file didn't help > much to get rid of this hang. > > diff --git a/arch/mips/include/asm/stackframe.h > b/arch/mips/include/asm/stackframe.h > index 58730c5..7fc9f10 100644 > --- a/arch/mips/include/asm/stackframe.h > +++ b/arch/mips/include/asm/stackframe.h > @@ -195,9 +195,9 @@ > * to cover the pipeline delay. > */ > .set mips32 > - mfc0 v1, CP0_TCSTATUS > + mfc0 v0, CP0_TCSTATUS > .set mips0 > - LONG_S v1, PT_TCSTATUS(sp) > + LONG_S v0, PT_TCSTATUS(sp) > #endif /* CONFIG_MIPS_MT_SMTC */ > LONG_S $4, PT_R4(sp) > LONG_S $5, PT_R5(sp) > > >> /K. >> >> On 12/24/10 6:39 AM, Anoop P A wrote: >>> Hi Kevin, Stuart , >>> >>> Woohooo You guys spotted !. >>> >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>> the culprit >>> >>> Once I restored previous version of stackframe.h 2.6.33-stable started >>> booting !. >>> >>> Thanks, >>> Anoop >>> >>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>> depending on the version) from two instructions after the mfc0 to a >>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>> the register access. Unfortunately, the v1 register is also used in the >>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>> clobbered before it gets stored. This will eventually result in the >>>> Status register getting a TCStatus value, which has some bits on common, >>>> but isn't identical and sooner or later Bad Things will happen. >>>> >>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>> >>>> Possible solutions would include reverting the store of the CP0_STATUS >>>> value to the block above the #ifdef, or, to retain whatever performance >>>> advantage was obtained by moving the store downward, to use v0/$2 >>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>> lean toward the second option, but I'm not in a position to test and >>>> submit a patch just now. >>>> >>>> Regards, >>>> >>>> Kevin K. >>>> >>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>> Kevin, >>>>> >>>>> I'm not sure if it's useful, >>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>> works 2.6.32-stable with patch 804 >>>>> works_not 2.6.33-stable >>>>> >>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>> and looking for timer interrupt related stuff found the following differences: >>>>> >>>>> >>>>> arch/mips/include/asm/irq.h >>>>> arch/mips/kernel/irq.c >>>>> do_IRQ >>>>> >>>>> arch/mips/include/asm/stackframe.h >>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>> >>>>> arch/mips/include/asm/time.h >>>>> clocksource_set_clock >>>>> >>>>> arch/mips/kernel/process.c >>>>> cpu_idle >>>>> >>>>> arch/mips/kernel/smtc.c >>>>> __irq_entry >>>>> ipi_decode >>>>> SMTC_CLOCK_TICK >>>>> >>>>> >>>>> Enclosed are the two subsets of files for a more expert look. >>>>> >>>>> I'll try to look in more detail after Christmas. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Stuart >>>>> >>>>> >>>>> >>>>> > ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-27 15:49 ` STUART VENTERS 0 siblings, 0 replies; 68+ messages in thread From: STUART VENTERS @ 2010-12-27 15:49 UTC (permalink / raw) To: Kevin D. Kissell, Anoop P A; +Cc: Anoop P.A., linux-mips Kevin, Outstanding, sometimes it's better to be lucky than good. Anoop, Maybe we can get lucky again. If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, I'll be happy to do another diff. Hope you'll have had a good Christmas as well. We've had snow in Alabama since Christmas eve! Regards, Stuart -----Original Message----- From: Kevin D. Kissell [mailto:kevink@paralogos.com] Sent: Friday, December 24, 2010 5:34 PM To: Anoop P A Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org Subject: Re: SMTC support status in latest git head. Ah, well, at least we have a stackframe.h fix that preserves David's performance tweak for the deeper pipelined processors. In looking for this, I did notice that someone did some modification to the SMTC clock tick logic that I was skeptical had ever been tested. If you've still got that kernel binary handy, you might check to see if it boots with maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. Oh, yes, and Merry Christmas one and all! Regards, Kevin K. On 12/24/10 8:02 AM, Anoop P A wrote: > On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >> fix things, while preserving the other fixes and performance enhancements? >> > I have tested that patch with 2.6.37 branch it well passes calibration > loop but hangs after switching to mips closource > > TC 6 going on-line as CPU 6 > Brought up 7 CPUs > bio: create slab<bio-0> at 0 > SCSI subsystem initialized > Switching to clocksource MIPS > > I Presume this is a different issue as restoring older file didn't help > much to get rid of this hang. > > diff --git a/arch/mips/include/asm/stackframe.h > b/arch/mips/include/asm/stackframe.h > index 58730c5..7fc9f10 100644 > --- a/arch/mips/include/asm/stackframe.h > +++ b/arch/mips/include/asm/stackframe.h > @@ -195,9 +195,9 @@ > * to cover the pipeline delay. > */ > .set mips32 > - mfc0 v1, CP0_TCSTATUS > + mfc0 v0, CP0_TCSTATUS > .set mips0 > - LONG_S v1, PT_TCSTATUS(sp) > + LONG_S v0, PT_TCSTATUS(sp) > #endif /* CONFIG_MIPS_MT_SMTC */ > LONG_S $4, PT_R4(sp) > LONG_S $5, PT_R5(sp) > > >> /K. >> >> On 12/24/10 6:39 AM, Anoop P A wrote: >>> Hi Kevin, Stuart , >>> >>> Woohooo You guys spotted !. >>> >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>> the culprit >>> >>> Once I restored previous version of stackframe.h 2.6.33-stable started >>> booting !. >>> >>> Thanks, >>> Anoop >>> >>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>> depending on the version) from two instructions after the mfc0 to a >>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>> the register access. Unfortunately, the v1 register is also used in the >>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>> clobbered before it gets stored. This will eventually result in the >>>> Status register getting a TCStatus value, which has some bits on common, >>>> but isn't identical and sooner or later Bad Things will happen. >>>> >>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>> >>>> Possible solutions would include reverting the store of the CP0_STATUS >>>> value to the block above the #ifdef, or, to retain whatever performance >>>> advantage was obtained by moving the store downward, to use v0/$2 >>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>> lean toward the second option, but I'm not in a position to test and >>>> submit a patch just now. >>>> >>>> Regards, >>>> >>>> Kevin K. >>>> >>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>> Kevin, >>>>> >>>>> I'm not sure if it's useful, >>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>> works 2.6.32-stable with patch 804 >>>>> works_not 2.6.33-stable >>>>> >>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>> and looking for timer interrupt related stuff found the following differences: >>>>> >>>>> >>>>> arch/mips/include/asm/irq.h >>>>> arch/mips/kernel/irq.c >>>>> do_IRQ >>>>> >>>>> arch/mips/include/asm/stackframe.h >>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>> >>>>> arch/mips/include/asm/time.h >>>>> clocksource_set_clock >>>>> >>>>> arch/mips/kernel/process.c >>>>> cpu_idle >>>>> >>>>> arch/mips/kernel/smtc.c >>>>> __irq_entry >>>>> ipi_decode >>>>> SMTC_CLOCK_TICK >>>>> >>>>> >>>>> Enclosed are the two subsets of files for a more expert look. >>>>> >>>>> I'll try to look in more detail after Christmas. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Stuart >>>>> >>>>> >>>>> >>>>> > ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. 2010-12-27 15:49 ` STUART VENTERS (?) @ 2010-12-27 17:19 ` Anoop P A 2010-12-28 8:19 ` Anoop P A -1 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-27 17:19 UTC (permalink / raw) To: STUART VENTERS; +Cc: Kevin D. Kissell, Anoop P.A., linux-mips Hi Kevin, It is very unlikely that the patch you pointed has any impact on the the hang I am seeing. The patch you have mentioned got into kernel around 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + stackframe patch) . Hi Stuart, I haven't got much time to spend on this today. I had got 2.6.36-stable(+ stack frame patch) booting last day and I have observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) So probably some patches in 2.6.37 branch introduced this hang. Hopefully I will get some free slot tomorrow so that I can look into code diff . Thanks Anoop On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > Kevin, > > Outstanding, sometimes it's better to be lucky than good. > > > Anoop, > > Maybe we can get lucky again. > > If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > I'll be happy to do another diff. > > > Hope you'll have had a good Christmas as well. > We've had snow in Alabama since Christmas eve! > > > Regards, > > Stuart > > > -----Original Message----- > From: Kevin D. Kissell [mailto:kevink@paralogos.com] > Sent: Friday, December 24, 2010 5:34 PM > To: Anoop P A > Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > Subject: Re: SMTC support status in latest git head. > > > Ah, well, at least we have a stackframe.h fix that preserves David's > performance tweak for the deeper pipelined processors. In looking for > this, I did notice that someone did some modification to the SMTC clock > tick logic that I was skeptical had ever been tested. If you've still > got that kernel binary handy, you might check to see if it boots with > maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > > Oh, yes, and Merry Christmas one and all! > > Regards, > > Kevin K. > > On 12/24/10 8:02 AM, Anoop P A wrote: > > On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >> fix things, while preserving the other fixes and performance enhancements? > >> > > I have tested that patch with 2.6.37 branch it well passes calibration > > loop but hangs after switching to mips closource > > > > TC 6 going on-line as CPU 6 > > Brought up 7 CPUs > > bio: create slab<bio-0> at 0 > > SCSI subsystem initialized > > Switching to clocksource MIPS > > > > I Presume this is a different issue as restoring older file didn't help > > much to get rid of this hang. > > > > diff --git a/arch/mips/include/asm/stackframe.h > > b/arch/mips/include/asm/stackframe.h > > index 58730c5..7fc9f10 100644 > > --- a/arch/mips/include/asm/stackframe.h > > +++ b/arch/mips/include/asm/stackframe.h > > @@ -195,9 +195,9 @@ > > * to cover the pipeline delay. > > */ > > .set mips32 > > - mfc0 v1, CP0_TCSTATUS > > + mfc0 v0, CP0_TCSTATUS > > .set mips0 > > - LONG_S v1, PT_TCSTATUS(sp) > > + LONG_S v0, PT_TCSTATUS(sp) > > #endif /* CONFIG_MIPS_MT_SMTC */ > > LONG_S $4, PT_R4(sp) > > LONG_S $5, PT_R5(sp) > > > > > >> /K. > >> > >> On 12/24/10 6:39 AM, Anoop P A wrote: > >>> Hi Kevin, Stuart , > >>> > >>> Woohooo You guys spotted !. > >>> > >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>> the culprit > >>> > >>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>> booting !. > >>> > >>> Thanks, > >>> Anoop > >>> > >>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>> depending on the version) from two instructions after the mfc0 to a > >>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>> the register access. Unfortunately, the v1 register is also used in the > >>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>> clobbered before it gets stored. This will eventually result in the > >>>> Status register getting a TCStatus value, which has some bits on common, > >>>> but isn't identical and sooner or later Bad Things will happen. > >>>> > >>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>> > >>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>> value to the block above the #ifdef, or, to retain whatever performance > >>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>> lean toward the second option, but I'm not in a position to test and > >>>> submit a patch just now. > >>>> > >>>> Regards, > >>>> > >>>> Kevin K. > >>>> > >>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>> Kevin, > >>>>> > >>>>> I'm not sure if it's useful, > >>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>> works 2.6.32-stable with patch 804 > >>>>> works_not 2.6.33-stable > >>>>> > >>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>> and looking for timer interrupt related stuff found the following differences: > >>>>> > >>>>> > >>>>> arch/mips/include/asm/irq.h > >>>>> arch/mips/kernel/irq.c > >>>>> do_IRQ > >>>>> > >>>>> arch/mips/include/asm/stackframe.h > >>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>> > >>>>> arch/mips/include/asm/time.h > >>>>> clocksource_set_clock > >>>>> > >>>>> arch/mips/kernel/process.c > >>>>> cpu_idle > >>>>> > >>>>> arch/mips/kernel/smtc.c > >>>>> __irq_entry > >>>>> ipi_decode > >>>>> SMTC_CLOCK_TICK > >>>>> > >>>>> > >>>>> Enclosed are the two subsets of files for a more expert look. > >>>>> > >>>>> I'll try to look in more detail after Christmas. > >>>>> > >>>>> > >>>>> Cheers, > >>>>> > >>>>> Stuart > >>>>> > >>>>> > >>>>> > >>>>> > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. 2010-12-27 17:19 ` Anoop P A @ 2010-12-28 8:19 ` Anoop P A 2010-12-28 8:43 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-28 8:19 UTC (permalink / raw) To: STUART VENTERS; +Cc: Kevin D. Kissell, Anoop P.A., linux-mips Hi, I had a glance into the code diff without notice of any suspect-able code . Tracing the hang showed that it is getting hanged in timekeeping_notify function. Thanks, Anoop PS: I may not be available until Thursday On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > Hi Kevin, > > It is very unlikely that the patch you pointed has any impact on the the > hang I am seeing. The patch you have mentioned got into kernel around > 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > stackframe patch) . > > Hi Stuart, > > I haven't got much time to spend on this today. > > I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > > So probably some patches in 2.6.37 branch introduced this hang. > > Hopefully I will get some free slot tomorrow so that I can look into > code diff . > > Thanks > Anoop > > On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > > Kevin, > > > > Outstanding, sometimes it's better to be lucky than good. > > > > > > Anoop, > > > > Maybe we can get lucky again. > > > > If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > > I'll be happy to do another diff. > > > > > > Hope you'll have had a good Christmas as well. > > We've had snow in Alabama since Christmas eve! > > > > > > Regards, > > > > Stuart > > > > > > -----Original Message----- > > From: Kevin D. Kissell [mailto:kevink@paralogos.com] > > Sent: Friday, December 24, 2010 5:34 PM > > To: Anoop P A > > Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > > Subject: Re: SMTC support status in latest git head. > > > > > > Ah, well, at least we have a stackframe.h fix that preserves David's > > performance tweak for the deeper pipelined processors. In looking for > > this, I did notice that someone did some modification to the SMTC clock > > tick logic that I was skeptical had ever been tested. If you've still > > got that kernel binary handy, you might check to see if it boots with > > maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > > > > Oh, yes, and Merry Christmas one and all! > > > > Regards, > > > > Kevin K. > > > > On 12/24/10 8:02 AM, Anoop P A wrote: > > > On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > > >> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > > >> fix things, while preserving the other fixes and performance enhancements? > > >> > > > I have tested that patch with 2.6.37 branch it well passes calibration > > > loop but hangs after switching to mips closource > > > > > > TC 6 going on-line as CPU 6 > > > Brought up 7 CPUs > > > bio: create slab<bio-0> at 0 > > > SCSI subsystem initialized > > > Switching to clocksource MIPS > > > > > > I Presume this is a different issue as restoring older file didn't help > > > much to get rid of this hang. > > > > > > diff --git a/arch/mips/include/asm/stackframe.h > > > b/arch/mips/include/asm/stackframe.h > > > index 58730c5..7fc9f10 100644 > > > --- a/arch/mips/include/asm/stackframe.h > > > +++ b/arch/mips/include/asm/stackframe.h > > > @@ -195,9 +195,9 @@ > > > * to cover the pipeline delay. > > > */ > > > .set mips32 > > > - mfc0 v1, CP0_TCSTATUS > > > + mfc0 v0, CP0_TCSTATUS > > > .set mips0 > > > - LONG_S v1, PT_TCSTATUS(sp) > > > + LONG_S v0, PT_TCSTATUS(sp) > > > #endif /* CONFIG_MIPS_MT_SMTC */ > > > LONG_S $4, PT_R4(sp) > > > LONG_S $5, PT_R5(sp) > > > > > > > > >> /K. > > >> > > >> On 12/24/10 6:39 AM, Anoop P A wrote: > > >>> Hi Kevin, Stuart , > > >>> > > >>> Woohooo You guys spotted !. > > >>> > > >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > > >>> the culprit > > >>> > > >>> Once I restored previous version of stackframe.h 2.6.33-stable started > > >>> booting !. > > >>> > > >>> Thanks, > > >>> Anoop > > >>> > > >>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > > >>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > > >>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > > >>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > > >>>> depending on the version) from two instructions after the mfc0 to a > > >>>> point after the #ifdef for SMTC, presumably to get better pipelining of > > >>>> the register access. Unfortunately, the v1 register is also used in the > > >>>> SMTC-specific fragment to save TCStatus, so the Status value gets > > >>>> clobbered before it gets stored. This will eventually result in the > > >>>> Status register getting a TCStatus value, which has some bits on common, > > >>>> but isn't identical and sooner or later Bad Things will happen. > > >>>> > > >>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > > >>>> > > >>>> Possible solutions would include reverting the store of the CP0_STATUS > > >>>> value to the block above the #ifdef, or, to retain whatever performance > > >>>> advantage was obtained by moving the store downward, to use v0/$2 > > >>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > > >>>> lean toward the second option, but I'm not in a position to test and > > >>>> submit a patch just now. > > >>>> > > >>>> Regards, > > >>>> > > >>>> Kevin K. > > >>>> > > >>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > > >>>>> Kevin, > > >>>>> > > >>>>> I'm not sure if it's useful, > > >>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > > >>>>> works 2.6.32-stable with patch 804 > > >>>>> works_not 2.6.33-stable > > >>>>> > > >>>>> greping for files with CONFIG_MIPS_MT_SMTC > > >>>>> and looking for timer interrupt related stuff found the following differences: > > >>>>> > > >>>>> > > >>>>> arch/mips/include/asm/irq.h > > >>>>> arch/mips/kernel/irq.c > > >>>>> do_IRQ > > >>>>> > > >>>>> arch/mips/include/asm/stackframe.h > > >>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > > >>>>> > > >>>>> arch/mips/include/asm/time.h > > >>>>> clocksource_set_clock > > >>>>> > > >>>>> arch/mips/kernel/process.c > > >>>>> cpu_idle > > >>>>> > > >>>>> arch/mips/kernel/smtc.c > > >>>>> __irq_entry > > >>>>> ipi_decode > > >>>>> SMTC_CLOCK_TICK > > >>>>> > > >>>>> > > >>>>> Enclosed are the two subsets of files for a more expert look. > > >>>>> > > >>>>> I'll try to look in more detail after Christmas. > > >>>>> > > >>>>> > > >>>>> Cheers, > > >>>>> > > >>>>> Stuart > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-28 8:19 ` Anoop P A @ 2010-12-28 8:43 ` Kevin D. Kissell 2010-12-31 12:27 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-28 8:43 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips I took a quick look last night, and the only thing that looked vaguely dangerous in changes since the timer changes I alluded to earlier was the global naming cleanup of irq-related function names that David Howell submitted. The diff didn't look dangerous in itself, but some of the definitions are nested subtly for SMTC to maximize the amount of common code, and I could imagine something getting lost in translation there. If that were really the problem, it would of course affect much more than just the timer subsystem, but early in the boot process, timers are pretty much the only interrupts that have to be handled correctly. I'm travelling today, but will take a look at timekeeping_notify() tomorrow or the next day... /K. On 12/28/10 12:19 AM, Anoop P A wrote: > Hi, > > I had a glance into the code diff without notice of any suspect-able > code . > Tracing the hang showed that it is getting hanged in timekeeping_notify > function. > > Thanks, > Anoop > > PS: I may not be available until Thursday > > On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: >> Hi Kevin, >> >> It is very unlikely that the patch you pointed has any impact on the the >> hang I am seeing. The patch you have mentioned got into kernel around >> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + >> stackframe patch) . >> >> Hi Stuart, >> >> I haven't got much time to spend on this today. >> >> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have >> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) >> >> So probably some patches in 2.6.37 branch introduced this hang. >> >> Hopefully I will get some free slot tomorrow so that I can look into >> code diff . >> >> Thanks >> Anoop >> >> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: >>> Kevin, >>> >>> Outstanding, sometimes it's better to be lucky than good. >>> >>> >>> Anoop, >>> >>> Maybe we can get lucky again. >>> >>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, >>> I'll be happy to do another diff. >>> >>> >>> Hope you'll have had a good Christmas as well. >>> We've had snow in Alabama since Christmas eve! >>> >>> >>> Regards, >>> >>> Stuart >>> >>> >>> -----Original Message----- >>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] >>> Sent: Friday, December 24, 2010 5:34 PM >>> To: Anoop P A >>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org >>> Subject: Re: SMTC support status in latest git head. >>> >>> >>> Ah, well, at least we have a stackframe.h fix that preserves David's >>> performance tweak for the deeper pipelined processors. In looking for >>> this, I did notice that someone did some modification to the SMTC clock >>> tick logic that I was skeptical had ever been tested. If you've still >>> got that kernel binary handy, you might check to see if it boots with >>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. >>> >>> Oh, yes, and Merry Christmas one and all! >>> >>> Regards, >>> >>> Kevin K. >>> >>> On 12/24/10 8:02 AM, Anoop P A wrote: >>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >>>>> fix things, while preserving the other fixes and performance enhancements? >>>>> >>>> I have tested that patch with 2.6.37 branch it well passes calibration >>>> loop but hangs after switching to mips closource >>>> >>>> TC 6 going on-line as CPU 6 >>>> Brought up 7 CPUs >>>> bio: create slab<bio-0> at 0 >>>> SCSI subsystem initialized >>>> Switching to clocksource MIPS >>>> >>>> I Presume this is a different issue as restoring older file didn't help >>>> much to get rid of this hang. >>>> >>>> diff --git a/arch/mips/include/asm/stackframe.h >>>> b/arch/mips/include/asm/stackframe.h >>>> index 58730c5..7fc9f10 100644 >>>> --- a/arch/mips/include/asm/stackframe.h >>>> +++ b/arch/mips/include/asm/stackframe.h >>>> @@ -195,9 +195,9 @@ >>>> * to cover the pipeline delay. >>>> */ >>>> .set mips32 >>>> - mfc0 v1, CP0_TCSTATUS >>>> + mfc0 v0, CP0_TCSTATUS >>>> .set mips0 >>>> - LONG_S v1, PT_TCSTATUS(sp) >>>> + LONG_S v0, PT_TCSTATUS(sp) >>>> #endif /* CONFIG_MIPS_MT_SMTC */ >>>> LONG_S $4, PT_R4(sp) >>>> LONG_S $5, PT_R5(sp) >>>> >>>> >>>>> /K. >>>>> >>>>> On 12/24/10 6:39 AM, Anoop P A wrote: >>>>>> Hi Kevin, Stuart , >>>>>> >>>>>> Woohooo You guys spotted !. >>>>>> >>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>>>>> the culprit >>>>>> >>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started >>>>>> booting !. >>>>>> >>>>>> Thanks, >>>>>> Anoop >>>>>> >>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>>>>> depending on the version) from two instructions after the mfc0 to a >>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>>>>> the register access. Unfortunately, the v1 register is also used in the >>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>>>>> clobbered before it gets stored. This will eventually result in the >>>>>>> Status register getting a TCStatus value, which has some bits on common, >>>>>>> but isn't identical and sooner or later Bad Things will happen. >>>>>>> >>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>>>>> >>>>>>> Possible solutions would include reverting the store of the CP0_STATUS >>>>>>> value to the block above the #ifdef, or, to retain whatever performance >>>>>>> advantage was obtained by moving the store downward, to use v0/$2 >>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>>>>> lean toward the second option, but I'm not in a position to test and >>>>>>> submit a patch just now. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Kevin K. >>>>>>> >>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>>>>> Kevin, >>>>>>>> >>>>>>>> I'm not sure if it's useful, >>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>>>>> works 2.6.32-stable with patch 804 >>>>>>>> works_not 2.6.33-stable >>>>>>>> >>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>>>>> and looking for timer interrupt related stuff found the following differences: >>>>>>>> >>>>>>>> >>>>>>>> arch/mips/include/asm/irq.h >>>>>>>> arch/mips/kernel/irq.c >>>>>>>> do_IRQ >>>>>>>> >>>>>>>> arch/mips/include/asm/stackframe.h >>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>>>>> >>>>>>>> arch/mips/include/asm/time.h >>>>>>>> clocksource_set_clock >>>>>>>> >>>>>>>> arch/mips/kernel/process.c >>>>>>>> cpu_idle >>>>>>>> >>>>>>>> arch/mips/kernel/smtc.c >>>>>>>> __irq_entry >>>>>>>> ipi_decode >>>>>>>> SMTC_CLOCK_TICK >>>>>>>> >>>>>>>> >>>>>>>> Enclosed are the two subsets of files for a more expert look. >>>>>>>> >>>>>>>> I'll try to look in more detail after Christmas. >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Stuart >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-28 8:43 ` Kevin D. Kissell @ 2010-12-31 12:27 ` Anoop P A 2011-01-01 8:42 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-31 12:27 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips Hi , Kernel hangs on stop_machine call. Please find mt reg dump below. Another important observation is even though 2.6.33 kernel + stackframe patch well passes calibration hang , I am still unable boot in to a initramfs root ( verified ramfs working with VSMP). So it looks like still some issue to fix between 2.6.32 and 2.6.33 . ######################## Log ########################### === MIPS MT State Dump === -- Global State -- MVPControl Passed: 00000005 MVPControl Read: 00000004 MVPConf0 : a8008406 -- per-VPE State -- VPE 0 VPEControl : 00008000 VPEConf0 : 800f0003 VPE0.Status : 11004201 VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 VPE0.Cause : 50804000 VPE0.Config7 : 00010000 VPE 1 VPEControl : 00068006 VPEConf0 : 80cf0003 VPE1.Status : 11008301 VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 VPE1.Cause : 50800000 VPE1.Config7 : 00010000 -- per-TC State -- TC 0 (current TC with VPE EPC above) TCStatus : 18102000 TCBind : 00000000 TCRestart : 803fa19c printk+0xc/0x30 TCHalt : 00000000 TCContext : 00000000 TC 1 TCStatus : 18902000 TCBind : 00200000 TCRestart : 801022a0 r4k_wait+0x20/0x40 TCHalt : 00000000 TCContext : 00140000 TC 2 TCStatus : 18902000 TCBind : 00400000 TCRestart : 801022a0 r4k_wait+0x20/0x40 TCHalt : 00000000 TCContext : 00280000 TC 3 TCStatus : 18902000 TCBind : 00600000 TCRestart : 801022a0 r4k_wait+0x20/0x40 TCHalt : 00000000 TCContext : 003c0000 TC 4 TCStatus : 18902000 TCBind : 00800001 TCRestart : 8010229c r4k_wait+0x1c/0x40 TCHalt : 00000000 TCContext : 00500000 TC 5 TCStatus : 18902000 TCBind : 00a00001 TCRestart : 8010229c r4k_wait+0x1c/0x40 TCHalt : 00000000 TCContext : 00640000 TC 6 TCStatus : 18902000 TCBind : 00c00001 TCRestart : 8010229c r4k_wait+0x1c/0x40 TCHalt : 00000000 TCContext : 00780000 Counter Interrupts taken per CPU (TC) 0: 0 1: 0 2: 0 3: 0 4: 0 5: 0 6: 0 7: 0 Self-IPI invocations: 0: 12 1: 0 2: 0 3: 0 4: 0 5: 5 6: 4 7: 0 IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 0 Recoveries of "stolen" FPU =========================== ################################################################ Thanks Anoop On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: > I took a quick look last night, and the only thing that looked vaguely > dangerous in changes since the timer changes I alluded to earlier was > the global naming cleanup of irq-related function names that David > Howell submitted. The diff didn't look dangerous in itself, but some of > the definitions are nested subtly for SMTC to maximize the amount of > common code, and I could imagine something getting lost in translation > there. If that were really the problem, it would of course affect much > more than just the timer subsystem, but early in the boot process, > timers are pretty much the only interrupts that have to be handled > correctly. > > I'm travelling today, but will take a look at timekeeping_notify() > tomorrow or the next day... > > /K. > > On 12/28/10 12:19 AM, Anoop P A wrote: > > Hi, > > > > I had a glance into the code diff without notice of any suspect-able > > code . > > Tracing the hang showed that it is getting hanged in timekeeping_notify > > function. > > > > Thanks, > > Anoop > > > > PS: I may not be available until Thursday > > > > On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > >> Hi Kevin, > >> > >> It is very unlikely that the patch you pointed has any impact on the the > >> hang I am seeing. The patch you have mentioned got into kernel around > >> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > >> stackframe patch) . > >> > >> Hi Stuart, > >> > >> I haven't got much time to spend on this today. > >> > >> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > >> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > >> > >> So probably some patches in 2.6.37 branch introduced this hang. > >> > >> Hopefully I will get some free slot tomorrow so that I can look into > >> code diff . > >> > >> Thanks > >> Anoop > >> > >> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > >>> Kevin, > >>> > >>> Outstanding, sometimes it's better to be lucky than good. > >>> > >>> > >>> Anoop, > >>> > >>> Maybe we can get lucky again. > >>> > >>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > >>> I'll be happy to do another diff. > >>> > >>> > >>> Hope you'll have had a good Christmas as well. > >>> We've had snow in Alabama since Christmas eve! > >>> > >>> > >>> Regards, > >>> > >>> Stuart > >>> > >>> > >>> -----Original Message----- > >>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] > >>> Sent: Friday, December 24, 2010 5:34 PM > >>> To: Anoop P A > >>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > >>> Subject: Re: SMTC support status in latest git head. > >>> > >>> > >>> Ah, well, at least we have a stackframe.h fix that preserves David's > >>> performance tweak for the deeper pipelined processors. In looking for > >>> this, I did notice that someone did some modification to the SMTC clock > >>> tick logic that I was skeptical had ever been tested. If you've still > >>> got that kernel binary handy, you might check to see if it boots with > >>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > >>> > >>> Oh, yes, and Merry Christmas one and all! > >>> > >>> Regards, > >>> > >>> Kevin K. > >>> > >>> On 12/24/10 8:02 AM, Anoop P A wrote: > >>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >>>>> fix things, while preserving the other fixes and performance enhancements? > >>>>> > >>>> I have tested that patch with 2.6.37 branch it well passes calibration > >>>> loop but hangs after switching to mips closource > >>>> > >>>> TC 6 going on-line as CPU 6 > >>>> Brought up 7 CPUs > >>>> bio: create slab<bio-0> at 0 > >>>> SCSI subsystem initialized > >>>> Switching to clocksource MIPS > >>>> > >>>> I Presume this is a different issue as restoring older file didn't help > >>>> much to get rid of this hang. > >>>> > >>>> diff --git a/arch/mips/include/asm/stackframe.h > >>>> b/arch/mips/include/asm/stackframe.h > >>>> index 58730c5..7fc9f10 100644 > >>>> --- a/arch/mips/include/asm/stackframe.h > >>>> +++ b/arch/mips/include/asm/stackframe.h > >>>> @@ -195,9 +195,9 @@ > >>>> * to cover the pipeline delay. > >>>> */ > >>>> .set mips32 > >>>> - mfc0 v1, CP0_TCSTATUS > >>>> + mfc0 v0, CP0_TCSTATUS > >>>> .set mips0 > >>>> - LONG_S v1, PT_TCSTATUS(sp) > >>>> + LONG_S v0, PT_TCSTATUS(sp) > >>>> #endif /* CONFIG_MIPS_MT_SMTC */ > >>>> LONG_S $4, PT_R4(sp) > >>>> LONG_S $5, PT_R5(sp) > >>>> > >>>> > >>>>> /K. > >>>>> > >>>>> On 12/24/10 6:39 AM, Anoop P A wrote: > >>>>>> Hi Kevin, Stuart , > >>>>>> > >>>>>> Woohooo You guys spotted !. > >>>>>> > >>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>>>>> the culprit > >>>>>> > >>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>>>>> booting !. > >>>>>> > >>>>>> Thanks, > >>>>>> Anoop > >>>>>> > >>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>>>>> depending on the version) from two instructions after the mfc0 to a > >>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>>>>> the register access. Unfortunately, the v1 register is also used in the > >>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>>>>> clobbered before it gets stored. This will eventually result in the > >>>>>>> Status register getting a TCStatus value, which has some bits on common, > >>>>>>> but isn't identical and sooner or later Bad Things will happen. > >>>>>>> > >>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>>>>> > >>>>>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>>>>> value to the block above the #ifdef, or, to retain whatever performance > >>>>>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>>>>> lean toward the second option, but I'm not in a position to test and > >>>>>>> submit a patch just now. > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Kevin K. > >>>>>>> > >>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>>>>> Kevin, > >>>>>>>> > >>>>>>>> I'm not sure if it's useful, > >>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>>>>> works 2.6.32-stable with patch 804 > >>>>>>>> works_not 2.6.33-stable > >>>>>>>> > >>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>>>>> and looking for timer interrupt related stuff found the following differences: > >>>>>>>> > >>>>>>>> > >>>>>>>> arch/mips/include/asm/irq.h > >>>>>>>> arch/mips/kernel/irq.c > >>>>>>>> do_IRQ > >>>>>>>> > >>>>>>>> arch/mips/include/asm/stackframe.h > >>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>>>>> > >>>>>>>> arch/mips/include/asm/time.h > >>>>>>>> clocksource_set_clock > >>>>>>>> > >>>>>>>> arch/mips/kernel/process.c > >>>>>>>> cpu_idle > >>>>>>>> > >>>>>>>> arch/mips/kernel/smtc.c > >>>>>>>> __irq_entry > >>>>>>>> ipi_decode > >>>>>>>> SMTC_CLOCK_TICK > >>>>>>>> > >>>>>>>> > >>>>>>>> Enclosed are the two subsets of files for a more expert look. > >>>>>>>> > >>>>>>>> I'll try to look in more detail after Christmas. > >>>>>>>> > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> > >>>>>>>> Stuart > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-31 12:27 ` Anoop P A @ 2011-01-01 8:42 ` Kevin D. Kissell 2011-01-03 15:12 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-01 8:42 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips At this point the logical thing to do would seem to look at your kernel image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 shows the last exception to have been taken. That's a critical SMTC routine that gets called whenever an xxx_irq_restore() enables interrupts, so that virtual per-TC IPI interrupts that were posted while the TC had interrupts disabled can be handled deterministically. As I mentioned in an earlier message, there was some cleanup work from David Howell that changed a number of irq management-related function names and prototypes across all architectures, which went into linux-mips.org at very roughly the time of the breakage. The SMTC overlay over the irq implementation has been pretty robust, but it's written in a perhaps doomed attempt to be both efficient and using a maximum amount of common code with the general case. A mechanical or semi-mechanical change could conceivably have broken things. Regards, Kevin K. On 12/31/2010 4:27 AM, Anoop P A wrote: > Hi , > > Kernel hangs on stop_machine call. Please find mt reg dump below. > Another important observation is even though 2.6.33 kernel + stackframe > patch well passes calibration hang , I am still unable boot in to a > initramfs root ( verified ramfs working with VSMP). So it looks like > still some issue to fix between 2.6.32 and 2.6.33 . > ######################## Log ########################### > > === MIPS MT State Dump === > -- Global State -- > MVPControl Passed: 00000005 > MVPControl Read: 00000004 > MVPConf0 : a8008406 > -- per-VPE State -- > VPE 0 > VPEControl : 00008000 > VPEConf0 : 800f0003 > VPE0.Status : 11004201 > VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 > VPE0.Cause : 50804000 > VPE0.Config7 : 00010000 > VPE 1 > VPEControl : 00068006 > VPEConf0 : 80cf0003 > VPE1.Status : 11008301 > VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 > VPE1.Cause : 50800000 > VPE1.Config7 : 00010000 > -- per-TC State -- > TC 0 (current TC with VPE EPC above) > TCStatus : 18102000 > TCBind : 00000000 > TCRestart : 803fa19c printk+0xc/0x30 > TCHalt : 00000000 > TCContext : 00000000 > TC 1 > TCStatus : 18902000 > TCBind : 00200000 > TCRestart : 801022a0 r4k_wait+0x20/0x40 > TCHalt : 00000000 > TCContext : 00140000 > TC 2 > TCStatus : 18902000 > TCBind : 00400000 > TCRestart : 801022a0 r4k_wait+0x20/0x40 > TCHalt : 00000000 > TCContext : 00280000 > TC 3 > TCStatus : 18902000 > TCBind : 00600000 > TCRestart : 801022a0 r4k_wait+0x20/0x40 > TCHalt : 00000000 > TCContext : 003c0000 > TC 4 > TCStatus : 18902000 > TCBind : 00800001 > TCRestart : 8010229c r4k_wait+0x1c/0x40 > TCHalt : 00000000 > TCContext : 00500000 > TC 5 > TCStatus : 18902000 > TCBind : 00a00001 > TCRestart : 8010229c r4k_wait+0x1c/0x40 > TCHalt : 00000000 > TCContext : 00640000 > TC 6 > TCStatus : 18902000 > TCBind : 00c00001 > TCRestart : 8010229c r4k_wait+0x1c/0x40 > TCHalt : 00000000 > TCContext : 00780000 > Counter Interrupts taken per CPU (TC) > 0: 0 > 1: 0 > 2: 0 > 3: 0 > 4: 0 > 5: 0 > 6: 0 > 7: 0 > Self-IPI invocations: > 0: 12 > 1: 0 > 2: 0 > 3: 0 > 4: 0 > 5: 5 > 6: 4 > 7: 0 > IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > 0 Recoveries of "stolen" FPU > =========================== > > ################################################################ > > Thanks > Anoop > > On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: >> I took a quick look last night, and the only thing that looked vaguely >> dangerous in changes since the timer changes I alluded to earlier was >> the global naming cleanup of irq-related function names that David >> Howell submitted. The diff didn't look dangerous in itself, but some of >> the definitions are nested subtly for SMTC to maximize the amount of >> common code, and I could imagine something getting lost in translation >> there. If that were really the problem, it would of course affect much >> more than just the timer subsystem, but early in the boot process, >> timers are pretty much the only interrupts that have to be handled >> correctly. >> >> I'm travelling today, but will take a look at timekeeping_notify() >> tomorrow or the next day... >> >> /K. >> >> On 12/28/10 12:19 AM, Anoop P A wrote: >>> Hi, >>> >>> I had a glance into the code diff without notice of any suspect-able >>> code . >>> Tracing the hang showed that it is getting hanged in timekeeping_notify >>> function. >>> >>> Thanks, >>> Anoop >>> >>> PS: I may not be available until Thursday >>> >>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: >>>> Hi Kevin, >>>> >>>> It is very unlikely that the patch you pointed has any impact on the the >>>> hang I am seeing. The patch you have mentioned got into kernel around >>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + >>>> stackframe patch) . >>>> >>>> Hi Stuart, >>>> >>>> I haven't got much time to spend on this today. >>>> >>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have >>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) >>>> >>>> So probably some patches in 2.6.37 branch introduced this hang. >>>> >>>> Hopefully I will get some free slot tomorrow so that I can look into >>>> code diff . >>>> >>>> Thanks >>>> Anoop >>>> >>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: >>>>> Kevin, >>>>> >>>>> Outstanding, sometimes it's better to be lucky than good. >>>>> >>>>> >>>>> Anoop, >>>>> >>>>> Maybe we can get lucky again. >>>>> >>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, >>>>> I'll be happy to do another diff. >>>>> >>>>> >>>>> Hope you'll have had a good Christmas as well. >>>>> We've had snow in Alabama since Christmas eve! >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Stuart >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] >>>>> Sent: Friday, December 24, 2010 5:34 PM >>>>> To: Anoop P A >>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org >>>>> Subject: Re: SMTC support status in latest git head. >>>>> >>>>> >>>>> Ah, well, at least we have a stackframe.h fix that preserves David's >>>>> performance tweak for the deeper pipelined processors. In looking for >>>>> this, I did notice that someone did some modification to the SMTC clock >>>>> tick logic that I was skeptical had ever been tested. If you've still >>>>> got that kernel binary handy, you might check to see if it boots with >>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. >>>>> >>>>> Oh, yes, and Merry Christmas one and all! >>>>> >>>>> Regards, >>>>> >>>>> Kevin K. >>>>> >>>>> On 12/24/10 8:02 AM, Anoop P A wrote: >>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >>>>>>> fix things, while preserving the other fixes and performance enhancements? >>>>>>> >>>>>> I have tested that patch with 2.6.37 branch it well passes calibration >>>>>> loop but hangs after switching to mips closource >>>>>> >>>>>> TC 6 going on-line as CPU 6 >>>>>> Brought up 7 CPUs >>>>>> bio: create slab<bio-0> at 0 >>>>>> SCSI subsystem initialized >>>>>> Switching to clocksource MIPS >>>>>> >>>>>> I Presume this is a different issue as restoring older file didn't help >>>>>> much to get rid of this hang. >>>>>> >>>>>> diff --git a/arch/mips/include/asm/stackframe.h >>>>>> b/arch/mips/include/asm/stackframe.h >>>>>> index 58730c5..7fc9f10 100644 >>>>>> --- a/arch/mips/include/asm/stackframe.h >>>>>> +++ b/arch/mips/include/asm/stackframe.h >>>>>> @@ -195,9 +195,9 @@ >>>>>> * to cover the pipeline delay. >>>>>> */ >>>>>> .set mips32 >>>>>> - mfc0 v1, CP0_TCSTATUS >>>>>> + mfc0 v0, CP0_TCSTATUS >>>>>> .set mips0 >>>>>> - LONG_S v1, PT_TCSTATUS(sp) >>>>>> + LONG_S v0, PT_TCSTATUS(sp) >>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ >>>>>> LONG_S $4, PT_R4(sp) >>>>>> LONG_S $5, PT_R5(sp) >>>>>> >>>>>> >>>>>>> /K. >>>>>>> >>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: >>>>>>>> Hi Kevin, Stuart , >>>>>>>> >>>>>>>> Woohooo You guys spotted !. >>>>>>>> >>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>>>>>>> the culprit >>>>>>>> >>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started >>>>>>>> booting !. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Anoop >>>>>>>> >>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>>>>>>> depending on the version) from two instructions after the mfc0 to a >>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>>>>>>> the register access. Unfortunately, the v1 register is also used in the >>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>>>>>>> clobbered before it gets stored. This will eventually result in the >>>>>>>>> Status register getting a TCStatus value, which has some bits on common, >>>>>>>>> but isn't identical and sooner or later Bad Things will happen. >>>>>>>>> >>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>>>>>>> >>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS >>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance >>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 >>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>>>>>>> lean toward the second option, but I'm not in a position to test and >>>>>>>>> submit a patch just now. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Kevin K. >>>>>>>>> >>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>>>>>>> Kevin, >>>>>>>>>> >>>>>>>>>> I'm not sure if it's useful, >>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>>>>>>> works 2.6.32-stable with patch 804 >>>>>>>>>> works_not 2.6.33-stable >>>>>>>>>> >>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>>>>>>> and looking for timer interrupt related stuff found the following differences: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> arch/mips/include/asm/irq.h >>>>>>>>>> arch/mips/kernel/irq.c >>>>>>>>>> do_IRQ >>>>>>>>>> >>>>>>>>>> arch/mips/include/asm/stackframe.h >>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>>>>>>> >>>>>>>>>> arch/mips/include/asm/time.h >>>>>>>>>> clocksource_set_clock >>>>>>>>>> >>>>>>>>>> arch/mips/kernel/process.c >>>>>>>>>> cpu_idle >>>>>>>>>> >>>>>>>>>> arch/mips/kernel/smtc.c >>>>>>>>>> __irq_entry >>>>>>>>>> ipi_decode >>>>>>>>>> SMTC_CLOCK_TICK >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Enclosed are the two subsets of files for a more expert look. >>>>>>>>>> >>>>>>>>>> I'll try to look in more detail after Christmas. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Stuart >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-01 8:42 ` Kevin D. Kissell @ 2011-01-03 15:12 ` Anoop P A 2011-01-03 16:14 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2011-01-03 15:12 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips Hi , Following patch restricts TREE_CPU RCU implementation only for !PREEMPT SMP kernel. http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel ( which will be only available RCU implementation for SMTC kernel from 2.6.37 onwards) . With no forced preemption and selecting TREE_CPU I am able to boot further to the hang that I have reported. Thanks Anoop On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: > At this point the logical thing to do would seem to look at your kernel > image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 > shows the last exception to have been taken. That's a critical SMTC > routine that gets called whenever an xxx_irq_restore() enables > interrupts, so that virtual per-TC IPI interrupts that were posted while > the TC had interrupts disabled can be handled deterministically. As I > mentioned in an earlier message, there was some cleanup work from David > Howell that changed a number of irq management-related function names > and prototypes across all architectures, which went into linux-mips.org > at very roughly the time of the breakage. The SMTC overlay over the irq > implementation has been pretty robust, but it's written in a perhaps > doomed attempt to be both efficient and using a maximum amount of common > code with the general case. A mechanical or semi-mechanical change > could conceivably have broken things. > > Regards, > > Kevin K. > > > On 12/31/2010 4:27 AM, Anoop P A wrote: > > Hi , > > > > Kernel hangs on stop_machine call. Please find mt reg dump below. > > Another important observation is even though 2.6.33 kernel + stackframe > > patch well passes calibration hang , I am still unable boot in to a > > initramfs root ( verified ramfs working with VSMP). So it looks like > > still some issue to fix between 2.6.32 and 2.6.33 . > > ######################## Log ########################### > > > > === MIPS MT State Dump === > > -- Global State -- > > MVPControl Passed: 00000005 > > MVPControl Read: 00000004 > > MVPConf0 : a8008406 > > -- per-VPE State -- > > VPE 0 > > VPEControl : 00008000 > > VPEConf0 : 800f0003 > > VPE0.Status : 11004201 > > VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 > > VPE0.Cause : 50804000 > > VPE0.Config7 : 00010000 > > VPE 1 > > VPEControl : 00068006 > > VPEConf0 : 80cf0003 > > VPE1.Status : 11008301 > > VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 > > VPE1.Cause : 50800000 > > VPE1.Config7 : 00010000 > > -- per-TC State -- > > TC 0 (current TC with VPE EPC above) > > TCStatus : 18102000 > > TCBind : 00000000 > > TCRestart : 803fa19c printk+0xc/0x30 > > TCHalt : 00000000 > > TCContext : 00000000 > > TC 1 > > TCStatus : 18902000 > > TCBind : 00200000 > > TCRestart : 801022a0 r4k_wait+0x20/0x40 > > TCHalt : 00000000 > > TCContext : 00140000 > > TC 2 > > TCStatus : 18902000 > > TCBind : 00400000 > > TCRestart : 801022a0 r4k_wait+0x20/0x40 > > TCHalt : 00000000 > > TCContext : 00280000 > > TC 3 > > TCStatus : 18902000 > > TCBind : 00600000 > > TCRestart : 801022a0 r4k_wait+0x20/0x40 > > TCHalt : 00000000 > > TCContext : 003c0000 > > TC 4 > > TCStatus : 18902000 > > TCBind : 00800001 > > TCRestart : 8010229c r4k_wait+0x1c/0x40 > > TCHalt : 00000000 > > TCContext : 00500000 > > TC 5 > > TCStatus : 18902000 > > TCBind : 00a00001 > > TCRestart : 8010229c r4k_wait+0x1c/0x40 > > TCHalt : 00000000 > > TCContext : 00640000 > > TC 6 > > TCStatus : 18902000 > > TCBind : 00c00001 > > TCRestart : 8010229c r4k_wait+0x1c/0x40 > > TCHalt : 00000000 > > TCContext : 00780000 > > Counter Interrupts taken per CPU (TC) > > 0: 0 > > 1: 0 > > 2: 0 > > 3: 0 > > 4: 0 > > 5: 0 > > 6: 0 > > 7: 0 > > Self-IPI invocations: > > 0: 12 > > 1: 0 > > 2: 0 > > 3: 0 > > 4: 0 > > 5: 5 > > 6: 4 > > 7: 0 > > IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > > 0 Recoveries of "stolen" FPU > > =========================== > > > > ################################################################ > > > > Thanks > > Anoop > > > > On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: > >> I took a quick look last night, and the only thing that looked vaguely > >> dangerous in changes since the timer changes I alluded to earlier was > >> the global naming cleanup of irq-related function names that David > >> Howell submitted. The diff didn't look dangerous in itself, but some of > >> the definitions are nested subtly for SMTC to maximize the amount of > >> common code, and I could imagine something getting lost in translation > >> there. If that were really the problem, it would of course affect much > >> more than just the timer subsystem, but early in the boot process, > >> timers are pretty much the only interrupts that have to be handled > >> correctly. > >> > >> I'm travelling today, but will take a look at timekeeping_notify() > >> tomorrow or the next day... > >> > >> /K. > >> > >> On 12/28/10 12:19 AM, Anoop P A wrote: > >>> Hi, > >>> > >>> I had a glance into the code diff without notice of any suspect-able > >>> code . > >>> Tracing the hang showed that it is getting hanged in timekeeping_notify > >>> function. > >>> > >>> Thanks, > >>> Anoop > >>> > >>> PS: I may not be available until Thursday > >>> > >>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > >>>> Hi Kevin, > >>>> > >>>> It is very unlikely that the patch you pointed has any impact on the the > >>>> hang I am seeing. The patch you have mentioned got into kernel around > >>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > >>>> stackframe patch) . > >>>> > >>>> Hi Stuart, > >>>> > >>>> I haven't got much time to spend on this today. > >>>> > >>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > >>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > >>>> > >>>> So probably some patches in 2.6.37 branch introduced this hang. > >>>> > >>>> Hopefully I will get some free slot tomorrow so that I can look into > >>>> code diff . > >>>> > >>>> Thanks > >>>> Anoop > >>>> > >>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > >>>>> Kevin, > >>>>> > >>>>> Outstanding, sometimes it's better to be lucky than good. > >>>>> > >>>>> > >>>>> Anoop, > >>>>> > >>>>> Maybe we can get lucky again. > >>>>> > >>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > >>>>> I'll be happy to do another diff. > >>>>> > >>>>> > >>>>> Hope you'll have had a good Christmas as well. > >>>>> We've had snow in Alabama since Christmas eve! > >>>>> > >>>>> > >>>>> Regards, > >>>>> > >>>>> Stuart > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] > >>>>> Sent: Friday, December 24, 2010 5:34 PM > >>>>> To: Anoop P A > >>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > >>>>> Subject: Re: SMTC support status in latest git head. > >>>>> > >>>>> > >>>>> Ah, well, at least we have a stackframe.h fix that preserves David's > >>>>> performance tweak for the deeper pipelined processors. In looking for > >>>>> this, I did notice that someone did some modification to the SMTC clock > >>>>> tick logic that I was skeptical had ever been tested. If you've still > >>>>> got that kernel binary handy, you might check to see if it boots with > >>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > >>>>> > >>>>> Oh, yes, and Merry Christmas one and all! > >>>>> > >>>>> Regards, > >>>>> > >>>>> Kevin K. > >>>>> > >>>>> On 12/24/10 8:02 AM, Anoop P A wrote: > >>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >>>>>>> fix things, while preserving the other fixes and performance enhancements? > >>>>>>> > >>>>>> I have tested that patch with 2.6.37 branch it well passes calibration > >>>>>> loop but hangs after switching to mips closource > >>>>>> > >>>>>> TC 6 going on-line as CPU 6 > >>>>>> Brought up 7 CPUs > >>>>>> bio: create slab<bio-0> at 0 > >>>>>> SCSI subsystem initialized > >>>>>> Switching to clocksource MIPS > >>>>>> > >>>>>> I Presume this is a different issue as restoring older file didn't help > >>>>>> much to get rid of this hang. > >>>>>> > >>>>>> diff --git a/arch/mips/include/asm/stackframe.h > >>>>>> b/arch/mips/include/asm/stackframe.h > >>>>>> index 58730c5..7fc9f10 100644 > >>>>>> --- a/arch/mips/include/asm/stackframe.h > >>>>>> +++ b/arch/mips/include/asm/stackframe.h > >>>>>> @@ -195,9 +195,9 @@ > >>>>>> * to cover the pipeline delay. > >>>>>> */ > >>>>>> .set mips32 > >>>>>> - mfc0 v1, CP0_TCSTATUS > >>>>>> + mfc0 v0, CP0_TCSTATUS > >>>>>> .set mips0 > >>>>>> - LONG_S v1, PT_TCSTATUS(sp) > >>>>>> + LONG_S v0, PT_TCSTATUS(sp) > >>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ > >>>>>> LONG_S $4, PT_R4(sp) > >>>>>> LONG_S $5, PT_R5(sp) > >>>>>> > >>>>>> > >>>>>>> /K. > >>>>>>> > >>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: > >>>>>>>> Hi Kevin, Stuart , > >>>>>>>> > >>>>>>>> Woohooo You guys spotted !. > >>>>>>>> > >>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>>>>>>> the culprit > >>>>>>>> > >>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>>>>>>> booting !. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Anoop > >>>>>>>> > >>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>>>>>>> depending on the version) from two instructions after the mfc0 to a > >>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>>>>>>> the register access. Unfortunately, the v1 register is also used in the > >>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>>>>>>> clobbered before it gets stored. This will eventually result in the > >>>>>>>>> Status register getting a TCStatus value, which has some bits on common, > >>>>>>>>> but isn't identical and sooner or later Bad Things will happen. > >>>>>>>>> > >>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>>>>>>> > >>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance > >>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>>>>>>> lean toward the second option, but I'm not in a position to test and > >>>>>>>>> submit a patch just now. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> > >>>>>>>>> Kevin K. > >>>>>>>>> > >>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>>>>>>> Kevin, > >>>>>>>>>> > >>>>>>>>>> I'm not sure if it's useful, > >>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>>>>>>> works 2.6.32-stable with patch 804 > >>>>>>>>>> works_not 2.6.33-stable > >>>>>>>>>> > >>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>>>>>>> and looking for timer interrupt related stuff found the following differences: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> arch/mips/include/asm/irq.h > >>>>>>>>>> arch/mips/kernel/irq.c > >>>>>>>>>> do_IRQ > >>>>>>>>>> > >>>>>>>>>> arch/mips/include/asm/stackframe.h > >>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>>>>>>> > >>>>>>>>>> arch/mips/include/asm/time.h > >>>>>>>>>> clocksource_set_clock > >>>>>>>>>> > >>>>>>>>>> arch/mips/kernel/process.c > >>>>>>>>>> cpu_idle > >>>>>>>>>> > >>>>>>>>>> arch/mips/kernel/smtc.c > >>>>>>>>>> __irq_entry > >>>>>>>>>> ipi_decode > >>>>>>>>>> SMTC_CLOCK_TICK > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Enclosed are the two subsets of files for a more expert look. > >>>>>>>>>> > >>>>>>>>>> I'll try to look in more detail after Christmas. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> > >>>>>>>>>> Stuart > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-03 15:12 ` Anoop P A @ 2011-01-03 16:14 ` Kevin D. Kissell 2011-01-03 19:20 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-03 16:14 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips The very first SMTC implementations didn't support full kernel-mode preemption, which anyway wasn't a priority, given the hardware event response support in MIPS MT. I believe it was later made compatible, but it was never extensively exercised. Since SMTC has fingers in some pretty low-level atomicity mechanisms, if a new, parallel set was implemented for RCU, I can easily imagine that nobody has yet implemented SMTC-ified variants of that set. Your last statement isn't very clear, though. Are you saying that if you configure for no forced preemption and with TREE_CPU, the 2.6.37 kernel boots all the way up, or that it simply hangs later? What's the last rev kernel that actually boots all the way up? Regards, Kevin K. On 1/3/2011 7:12 AM, Anoop P A wrote: > Hi , > > Following patch restricts TREE_CPU RCU implementation only for !PREEMPT > SMP kernel. > http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 > > CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel > ( which will be only available RCU implementation for SMTC kernel from > 2.6.37 onwards) . > > With no forced preemption and selecting TREE_CPU I am able to boot > further to the hang that I have reported. > > Thanks > Anoop > > On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: >> At this point the logical thing to do would seem to look at your kernel >> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 >> shows the last exception to have been taken. That's a critical SMTC >> routine that gets called whenever an xxx_irq_restore() enables >> interrupts, so that virtual per-TC IPI interrupts that were posted while >> the TC had interrupts disabled can be handled deterministically. As I >> mentioned in an earlier message, there was some cleanup work from David >> Howell that changed a number of irq management-related function names >> and prototypes across all architectures, which went into linux-mips.org >> at very roughly the time of the breakage. The SMTC overlay over the irq >> implementation has been pretty robust, but it's written in a perhaps >> doomed attempt to be both efficient and using a maximum amount of common >> code with the general case. A mechanical or semi-mechanical change >> could conceivably have broken things. >> >> Regards, >> >> Kevin K. >> >> >> On 12/31/2010 4:27 AM, Anoop P A wrote: >>> Hi , >>> >>> Kernel hangs on stop_machine call. Please find mt reg dump below. >>> Another important observation is even though 2.6.33 kernel + stackframe >>> patch well passes calibration hang , I am still unable boot in to a >>> initramfs root ( verified ramfs working with VSMP). So it looks like >>> still some issue to fix between 2.6.32 and 2.6.33 . >>> ######################## Log ########################### >>> >>> === MIPS MT State Dump === >>> -- Global State -- >>> MVPControl Passed: 00000005 >>> MVPControl Read: 00000004 >>> MVPConf0 : a8008406 >>> -- per-VPE State -- >>> VPE 0 >>> VPEControl : 00008000 >>> VPEConf0 : 800f0003 >>> VPE0.Status : 11004201 >>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 >>> VPE0.Cause : 50804000 >>> VPE0.Config7 : 00010000 >>> VPE 1 >>> VPEControl : 00068006 >>> VPEConf0 : 80cf0003 >>> VPE1.Status : 11008301 >>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 >>> VPE1.Cause : 50800000 >>> VPE1.Config7 : 00010000 >>> -- per-TC State -- >>> TC 0 (current TC with VPE EPC above) >>> TCStatus : 18102000 >>> TCBind : 00000000 >>> TCRestart : 803fa19c printk+0xc/0x30 >>> TCHalt : 00000000 >>> TCContext : 00000000 >>> TC 1 >>> TCStatus : 18902000 >>> TCBind : 00200000 >>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>> TCHalt : 00000000 >>> TCContext : 00140000 >>> TC 2 >>> TCStatus : 18902000 >>> TCBind : 00400000 >>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>> TCHalt : 00000000 >>> TCContext : 00280000 >>> TC 3 >>> TCStatus : 18902000 >>> TCBind : 00600000 >>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>> TCHalt : 00000000 >>> TCContext : 003c0000 >>> TC 4 >>> TCStatus : 18902000 >>> TCBind : 00800001 >>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>> TCHalt : 00000000 >>> TCContext : 00500000 >>> TC 5 >>> TCStatus : 18902000 >>> TCBind : 00a00001 >>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>> TCHalt : 00000000 >>> TCContext : 00640000 >>> TC 6 >>> TCStatus : 18902000 >>> TCBind : 00c00001 >>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>> TCHalt : 00000000 >>> TCContext : 00780000 >>> Counter Interrupts taken per CPU (TC) >>> 0: 0 >>> 1: 0 >>> 2: 0 >>> 3: 0 >>> 4: 0 >>> 5: 0 >>> 6: 0 >>> 7: 0 >>> Self-IPI invocations: >>> 0: 12 >>> 1: 0 >>> 2: 0 >>> 3: 0 >>> 4: 0 >>> 5: 5 >>> 6: 4 >>> 7: 0 >>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 >>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 >>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 >>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 >>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 >>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 >>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 >>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 >>> 0 Recoveries of "stolen" FPU >>> =========================== >>> >>> ################################################################ >>> >>> Thanks >>> Anoop >>> >>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: >>>> I took a quick look last night, and the only thing that looked vaguely >>>> dangerous in changes since the timer changes I alluded to earlier was >>>> the global naming cleanup of irq-related function names that David >>>> Howell submitted. The diff didn't look dangerous in itself, but some of >>>> the definitions are nested subtly for SMTC to maximize the amount of >>>> common code, and I could imagine something getting lost in translation >>>> there. If that were really the problem, it would of course affect much >>>> more than just the timer subsystem, but early in the boot process, >>>> timers are pretty much the only interrupts that have to be handled >>>> correctly. >>>> >>>> I'm travelling today, but will take a look at timekeeping_notify() >>>> tomorrow or the next day... >>>> >>>> /K. >>>> >>>> On 12/28/10 12:19 AM, Anoop P A wrote: >>>>> Hi, >>>>> >>>>> I had a glance into the code diff without notice of any suspect-able >>>>> code . >>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify >>>>> function. >>>>> >>>>> Thanks, >>>>> Anoop >>>>> >>>>> PS: I may not be available until Thursday >>>>> >>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: >>>>>> Hi Kevin, >>>>>> >>>>>> It is very unlikely that the patch you pointed has any impact on the the >>>>>> hang I am seeing. The patch you have mentioned got into kernel around >>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + >>>>>> stackframe patch) . >>>>>> >>>>>> Hi Stuart, >>>>>> >>>>>> I haven't got much time to spend on this today. >>>>>> >>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have >>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) >>>>>> >>>>>> So probably some patches in 2.6.37 branch introduced this hang. >>>>>> >>>>>> Hopefully I will get some free slot tomorrow so that I can look into >>>>>> code diff . >>>>>> >>>>>> Thanks >>>>>> Anoop >>>>>> >>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: >>>>>>> Kevin, >>>>>>> >>>>>>> Outstanding, sometimes it's better to be lucky than good. >>>>>>> >>>>>>> >>>>>>> Anoop, >>>>>>> >>>>>>> Maybe we can get lucky again. >>>>>>> >>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, >>>>>>> I'll be happy to do another diff. >>>>>>> >>>>>>> >>>>>>> Hope you'll have had a good Christmas as well. >>>>>>> We've had snow in Alabama since Christmas eve! >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Stuart >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] >>>>>>> Sent: Friday, December 24, 2010 5:34 PM >>>>>>> To: Anoop P A >>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org >>>>>>> Subject: Re: SMTC support status in latest git head. >>>>>>> >>>>>>> >>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's >>>>>>> performance tweak for the deeper pipelined processors. In looking for >>>>>>> this, I did notice that someone did some modification to the SMTC clock >>>>>>> tick logic that I was skeptical had ever been tested. If you've still >>>>>>> got that kernel binary handy, you might check to see if it boots with >>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. >>>>>>> >>>>>>> Oh, yes, and Merry Christmas one and all! >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Kevin K. >>>>>>> >>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote: >>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >>>>>>>>> fix things, while preserving the other fixes and performance enhancements? >>>>>>>>> >>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration >>>>>>>> loop but hangs after switching to mips closource >>>>>>>> >>>>>>>> TC 6 going on-line as CPU 6 >>>>>>>> Brought up 7 CPUs >>>>>>>> bio: create slab<bio-0> at 0 >>>>>>>> SCSI subsystem initialized >>>>>>>> Switching to clocksource MIPS >>>>>>>> >>>>>>>> I Presume this is a different issue as restoring older file didn't help >>>>>>>> much to get rid of this hang. >>>>>>>> >>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h >>>>>>>> b/arch/mips/include/asm/stackframe.h >>>>>>>> index 58730c5..7fc9f10 100644 >>>>>>>> --- a/arch/mips/include/asm/stackframe.h >>>>>>>> +++ b/arch/mips/include/asm/stackframe.h >>>>>>>> @@ -195,9 +195,9 @@ >>>>>>>> * to cover the pipeline delay. >>>>>>>> */ >>>>>>>> .set mips32 >>>>>>>> - mfc0 v1, CP0_TCSTATUS >>>>>>>> + mfc0 v0, CP0_TCSTATUS >>>>>>>> .set mips0 >>>>>>>> - LONG_S v1, PT_TCSTATUS(sp) >>>>>>>> + LONG_S v0, PT_TCSTATUS(sp) >>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ >>>>>>>> LONG_S $4, PT_R4(sp) >>>>>>>> LONG_S $5, PT_R5(sp) >>>>>>>> >>>>>>>> >>>>>>>>> /K. >>>>>>>>> >>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: >>>>>>>>>> Hi Kevin, Stuart , >>>>>>>>>> >>>>>>>>>> Woohooo You guys spotted !. >>>>>>>>>> >>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>>>>>>>>> the culprit >>>>>>>>>> >>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started >>>>>>>>>> booting !. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Anoop >>>>>>>>>> >>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a >>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the >>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>>>>>>>>> clobbered before it gets stored. This will eventually result in the >>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common, >>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen. >>>>>>>>>>> >>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>>>>>>>>> >>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS >>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance >>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 >>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>>>>>>>>> lean toward the second option, but I'm not in a position to test and >>>>>>>>>>> submit a patch just now. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Kevin K. >>>>>>>>>>> >>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>>>>>>>>> Kevin, >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure if it's useful, >>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>>>>>>>>> works 2.6.32-stable with patch 804 >>>>>>>>>>>> works_not 2.6.33-stable >>>>>>>>>>>> >>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> arch/mips/include/asm/irq.h >>>>>>>>>>>> arch/mips/kernel/irq.c >>>>>>>>>>>> do_IRQ >>>>>>>>>>>> >>>>>>>>>>>> arch/mips/include/asm/stackframe.h >>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>>>>>>>>> >>>>>>>>>>>> arch/mips/include/asm/time.h >>>>>>>>>>>> clocksource_set_clock >>>>>>>>>>>> >>>>>>>>>>>> arch/mips/kernel/process.c >>>>>>>>>>>> cpu_idle >>>>>>>>>>>> >>>>>>>>>>>> arch/mips/kernel/smtc.c >>>>>>>>>>>> __irq_entry >>>>>>>>>>>> ipi_decode >>>>>>>>>>>> SMTC_CLOCK_TICK >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look. >>>>>>>>>>>> >>>>>>>>>>>> I'll try to look in more detail after Christmas. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> Stuart >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-03 16:14 ` Kevin D. Kissell @ 2011-01-03 19:20 ` Anoop P A 2011-01-04 8:17 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2011-01-03 19:20 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips Hi Kevin, On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: > The very first SMTC implementations didn't support full kernel-mode > preemption, which anyway wasn't a priority, given the hardware event > response support in MIPS MT. I believe it was later made compatible, > but it was never extensively exercised. Since SMTC has fingers in some > pretty low-level atomicity mechanisms, if a new, parallel set was > implemented for RCU, I can easily imagine that nobody has yet > implemented SMTC-ified variants of that set. > > Your last statement isn't very clear, though. Are you saying that if > you configure for no forced preemption and with TREE_CPU, the 2.6.37 > kernel boots all the way up, or that it simply hangs later? What's the > last rev kernel that actually boots all the way up? I have debugged this a bit more. It seems that kernel getting stalled while executing on TC's of second VPE . INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected by 1, t=2504 jiffies) INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected by 1, t=10036 jiffies) INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected by 1, t=17568 jiffies) INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected by 1, t=25100 jiffies) INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected by 1, t=32632 jiffies) INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected by 1, t=40164 jiffies) With CONFIG_TREE_CPU we were not hitting this scenario very often. However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. I presume some issue in my timer setup . I am not seeing timer interrupt (or IPI interrupt) getting incremented for VPE1 tcs on a completely booted 2.6.32-stable kernel. / # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 1: 148 15023 15140 15093 3779 8 2 MIPS SMTC_IPI 6: 0 0 0 0 0 0 0 MIPS MSP CIC cascade 8: 0 0 0 0 0 0 0 MSP_CIC Softreset button 9: 0 0 0 0 0 0 0 MSP_CIC Standby switch 21: 0 0 0 0 0 0 0 MSP_CIC MSP PER cascade 25: 15113 341 4 7 0 0 0 MSP_CIC timer 27: 260 9 0 1 0 0 0 MSP_CIC serial 34: 0 0 0 0 0 0 0 MSP_CIC timer Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. I have tried setting up VPE1 timer from get_co_compare_int as follows unsigned int __cpuinit get_c0_compare_int(void) { if ((1==get_current_vpe()) && !vpe1_timr_installed){ memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); setup_irq(MSP_INT_VPE1_TIMER, &timer_vpe1); vpe1_timr_installed++; } return (get_current_vpe() ? MSP_INT_VPE1_TIMER : MSP_INT_VPE0_TIMER); } Thanks Anoop > > Regards, > > Kevin K. > > On 1/3/2011 7:12 AM, Anoop P A wrote: > > Hi , > > > > Following patch restricts TREE_CPU RCU implementation only for !PREEMPT > > SMP kernel. > > http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 > > > > CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel > > ( which will be only available RCU implementation for SMTC kernel from > > 2.6.37 onwards) . > > > > With no forced preemption and selecting TREE_CPU I am able to boot > > further to the hang that I have reported. > > > > Thanks > > Anoop > > > > On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: > >> At this point the logical thing to do would seem to look at your kernel > >> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 > >> shows the last exception to have been taken. That's a critical SMTC > >> routine that gets called whenever an xxx_irq_restore() enables > >> interrupts, so that virtual per-TC IPI interrupts that were posted while > >> the TC had interrupts disabled can be handled deterministically. As I > >> mentioned in an earlier message, there was some cleanup work from David > >> Howell that changed a number of irq management-related function names > >> and prototypes across all architectures, which went into linux-mips.org > >> at very roughly the time of the breakage. The SMTC overlay over the irq > >> implementation has been pretty robust, but it's written in a perhaps > >> doomed attempt to be both efficient and using a maximum amount of common > >> code with the general case. A mechanical or semi-mechanical change > >> could conceivably have broken things. > >> > >> Regards, > >> > >> Kevin K. > >> > >> > >> On 12/31/2010 4:27 AM, Anoop P A wrote: > >>> Hi , > >>> > >>> Kernel hangs on stop_machine call. Please find mt reg dump below. > >>> Another important observation is even though 2.6.33 kernel + stackframe > >>> patch well passes calibration hang , I am still unable boot in to a > >>> initramfs root ( verified ramfs working with VSMP). So it looks like > >>> still some issue to fix between 2.6.32 and 2.6.33 . > >>> ######################## Log ########################### > >>> > >>> === MIPS MT State Dump === > >>> -- Global State -- > >>> MVPControl Passed: 00000005 > >>> MVPControl Read: 00000004 > >>> MVPConf0 : a8008406 > >>> -- per-VPE State -- > >>> VPE 0 > >>> VPEControl : 00008000 > >>> VPEConf0 : 800f0003 > >>> VPE0.Status : 11004201 > >>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 > >>> VPE0.Cause : 50804000 > >>> VPE0.Config7 : 00010000 > >>> VPE 1 > >>> VPEControl : 00068006 > >>> VPEConf0 : 80cf0003 > >>> VPE1.Status : 11008301 > >>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 > >>> VPE1.Cause : 50800000 > >>> VPE1.Config7 : 00010000 > >>> -- per-TC State -- > >>> TC 0 (current TC with VPE EPC above) > >>> TCStatus : 18102000 > >>> TCBind : 00000000 > >>> TCRestart : 803fa19c printk+0xc/0x30 > >>> TCHalt : 00000000 > >>> TCContext : 00000000 > >>> TC 1 > >>> TCStatus : 18902000 > >>> TCBind : 00200000 > >>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>> TCHalt : 00000000 > >>> TCContext : 00140000 > >>> TC 2 > >>> TCStatus : 18902000 > >>> TCBind : 00400000 > >>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>> TCHalt : 00000000 > >>> TCContext : 00280000 > >>> TC 3 > >>> TCStatus : 18902000 > >>> TCBind : 00600000 > >>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>> TCHalt : 00000000 > >>> TCContext : 003c0000 > >>> TC 4 > >>> TCStatus : 18902000 > >>> TCBind : 00800001 > >>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>> TCHalt : 00000000 > >>> TCContext : 00500000 > >>> TC 5 > >>> TCStatus : 18902000 > >>> TCBind : 00a00001 > >>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>> TCHalt : 00000000 > >>> TCContext : 00640000 > >>> TC 6 > >>> TCStatus : 18902000 > >>> TCBind : 00c00001 > >>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>> TCHalt : 00000000 > >>> TCContext : 00780000 > >>> Counter Interrupts taken per CPU (TC) > >>> 0: 0 > >>> 1: 0 > >>> 2: 0 > >>> 3: 0 > >>> 4: 0 > >>> 5: 0 > >>> 6: 0 > >>> 7: 0 > >>> Self-IPI invocations: > >>> 0: 12 > >>> 1: 0 > >>> 2: 0 > >>> 3: 0 > >>> 4: 0 > >>> 5: 5 > >>> 6: 4 > >>> 7: 0 > >>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > >>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > >>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > >>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > >>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > >>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > >>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > >>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > >>> 0 Recoveries of "stolen" FPU > >>> =========================== > >>> > >>> ################################################################ > >>> > >>> Thanks > >>> Anoop > >>> > >>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: > >>>> I took a quick look last night, and the only thing that looked vaguely > >>>> dangerous in changes since the timer changes I alluded to earlier was > >>>> the global naming cleanup of irq-related function names that David > >>>> Howell submitted. The diff didn't look dangerous in itself, but some of > >>>> the definitions are nested subtly for SMTC to maximize the amount of > >>>> common code, and I could imagine something getting lost in translation > >>>> there. If that were really the problem, it would of course affect much > >>>> more than just the timer subsystem, but early in the boot process, > >>>> timers are pretty much the only interrupts that have to be handled > >>>> correctly. > >>>> > >>>> I'm travelling today, but will take a look at timekeeping_notify() > >>>> tomorrow or the next day... > >>>> > >>>> /K. > >>>> > >>>> On 12/28/10 12:19 AM, Anoop P A wrote: > >>>>> Hi, > >>>>> > >>>>> I had a glance into the code diff without notice of any suspect-able > >>>>> code . > >>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify > >>>>> function. > >>>>> > >>>>> Thanks, > >>>>> Anoop > >>>>> > >>>>> PS: I may not be available until Thursday > >>>>> > >>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > >>>>>> Hi Kevin, > >>>>>> > >>>>>> It is very unlikely that the patch you pointed has any impact on the the > >>>>>> hang I am seeing. The patch you have mentioned got into kernel around > >>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > >>>>>> stackframe patch) . > >>>>>> > >>>>>> Hi Stuart, > >>>>>> > >>>>>> I haven't got much time to spend on this today. > >>>>>> > >>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > >>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > >>>>>> > >>>>>> So probably some patches in 2.6.37 branch introduced this hang. > >>>>>> > >>>>>> Hopefully I will get some free slot tomorrow so that I can look into > >>>>>> code diff . > >>>>>> > >>>>>> Thanks > >>>>>> Anoop > >>>>>> > >>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > >>>>>>> Kevin, > >>>>>>> > >>>>>>> Outstanding, sometimes it's better to be lucky than good. > >>>>>>> > >>>>>>> > >>>>>>> Anoop, > >>>>>>> > >>>>>>> Maybe we can get lucky again. > >>>>>>> > >>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > >>>>>>> I'll be happy to do another diff. > >>>>>>> > >>>>>>> > >>>>>>> Hope you'll have had a good Christmas as well. > >>>>>>> We've had snow in Alabama since Christmas eve! > >>>>>>> > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Stuart > >>>>>>> > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] > >>>>>>> Sent: Friday, December 24, 2010 5:34 PM > >>>>>>> To: Anoop P A > >>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > >>>>>>> Subject: Re: SMTC support status in latest git head. > >>>>>>> > >>>>>>> > >>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's > >>>>>>> performance tweak for the deeper pipelined processors. In looking for > >>>>>>> this, I did notice that someone did some modification to the SMTC clock > >>>>>>> tick logic that I was skeptical had ever been tested. If you've still > >>>>>>> got that kernel binary handy, you might check to see if it boots with > >>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > >>>>>>> > >>>>>>> Oh, yes, and Merry Christmas one and all! > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Kevin K. > >>>>>>> > >>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote: > >>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >>>>>>>>> fix things, while preserving the other fixes and performance enhancements? > >>>>>>>>> > >>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration > >>>>>>>> loop but hangs after switching to mips closource > >>>>>>>> > >>>>>>>> TC 6 going on-line as CPU 6 > >>>>>>>> Brought up 7 CPUs > >>>>>>>> bio: create slab<bio-0> at 0 > >>>>>>>> SCSI subsystem initialized > >>>>>>>> Switching to clocksource MIPS > >>>>>>>> > >>>>>>>> I Presume this is a different issue as restoring older file didn't help > >>>>>>>> much to get rid of this hang. > >>>>>>>> > >>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h > >>>>>>>> b/arch/mips/include/asm/stackframe.h > >>>>>>>> index 58730c5..7fc9f10 100644 > >>>>>>>> --- a/arch/mips/include/asm/stackframe.h > >>>>>>>> +++ b/arch/mips/include/asm/stackframe.h > >>>>>>>> @@ -195,9 +195,9 @@ > >>>>>>>> * to cover the pipeline delay. > >>>>>>>> */ > >>>>>>>> .set mips32 > >>>>>>>> - mfc0 v1, CP0_TCSTATUS > >>>>>>>> + mfc0 v0, CP0_TCSTATUS > >>>>>>>> .set mips0 > >>>>>>>> - LONG_S v1, PT_TCSTATUS(sp) > >>>>>>>> + LONG_S v0, PT_TCSTATUS(sp) > >>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ > >>>>>>>> LONG_S $4, PT_R4(sp) > >>>>>>>> LONG_S $5, PT_R5(sp) > >>>>>>>> > >>>>>>>> > >>>>>>>>> /K. > >>>>>>>>> > >>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: > >>>>>>>>>> Hi Kevin, Stuart , > >>>>>>>>>> > >>>>>>>>>> Woohooo You guys spotted !. > >>>>>>>>>> > >>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>>>>>>>>> the culprit > >>>>>>>>>> > >>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>>>>>>>>> booting !. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Anoop > >>>>>>>>>> > >>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a > >>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the > >>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>>>>>>>>> clobbered before it gets stored. This will eventually result in the > >>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common, > >>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen. > >>>>>>>>>>> > >>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>>>>>>>>> > >>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance > >>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>>>>>>>>> lean toward the second option, but I'm not in a position to test and > >>>>>>>>>>> submit a patch just now. > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> > >>>>>>>>>>> Kevin K. > >>>>>>>>>>> > >>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>>>>>>>>> Kevin, > >>>>>>>>>>>> > >>>>>>>>>>>> I'm not sure if it's useful, > >>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>>>>>>>>> works 2.6.32-stable with patch 804 > >>>>>>>>>>>> works_not 2.6.33-stable > >>>>>>>>>>>> > >>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> arch/mips/include/asm/irq.h > >>>>>>>>>>>> arch/mips/kernel/irq.c > >>>>>>>>>>>> do_IRQ > >>>>>>>>>>>> > >>>>>>>>>>>> arch/mips/include/asm/stackframe.h > >>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>>>>>>>>> > >>>>>>>>>>>> arch/mips/include/asm/time.h > >>>>>>>>>>>> clocksource_set_clock > >>>>>>>>>>>> > >>>>>>>>>>>> arch/mips/kernel/process.c > >>>>>>>>>>>> cpu_idle > >>>>>>>>>>>> > >>>>>>>>>>>> arch/mips/kernel/smtc.c > >>>>>>>>>>>> __irq_entry > >>>>>>>>>>>> ipi_decode > >>>>>>>>>>>> SMTC_CLOCK_TICK > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look. > >>>>>>>>>>>> > >>>>>>>>>>>> I'll try to look in more detail after Christmas. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> > >>>>>>>>>>>> Stuart > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-03 19:20 ` Anoop P A @ 2011-01-04 8:17 ` Kevin D. Kissell 2011-01-04 13:02 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-04 8:17 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips Those interrupt counters show that IPIs are being taken everywhere, though very few by CPUs 5 and 6. If I understand the configuration correctly, CPU 4 is a TC in VPE 1, and it's getting a reasonable IPI rate, *if* we're looking at a tickless kernel under low load. But there may be a clue there to part of your problem. I have no idea why the behavior would have changed from 2.6.36 to 2.6.37, but it looks as if you're getting your clock interrupts through the MSP CIC interrupt controller on VPE 0. There's nothing symmetric for VPE1. The Malta example code is perhaps deceptively simple, in that both VPEs have their count/compare indication wired directly to the 2 clock interrupt inputs, so that having both of them running with only a single set of irq state just works. I don't know whether the MSP CIC timer interrupt is a gating of the VPE0 count/compare output, or whether it's it's own interval timer, but I suspect that you may need to do some further low-level initialization in the platform-specific code to set up an interrupt on the VPE1 side. I don't think the snippet you've got below would work as written. If it's purely an issue with clock distribution on VPE1, then a boot with maxvpes=1 maxtcs=4 should be stable. /K. On 1/3/2011 11:20 AM, Anoop P A wrote: > Hi Kevin, > > On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: >> The very first SMTC implementations didn't support full kernel-mode >> preemption, which anyway wasn't a priority, given the hardware event >> response support in MIPS MT. I believe it was later made compatible, >> but it was never extensively exercised. Since SMTC has fingers in some >> pretty low-level atomicity mechanisms, if a new, parallel set was >> implemented for RCU, I can easily imagine that nobody has yet >> implemented SMTC-ified variants of that set. >> >> Your last statement isn't very clear, though. Are you saying that if >> you configure for no forced preemption and with TREE_CPU, the 2.6.37 >> kernel boots all the way up, or that it simply hangs later? What's the >> last rev kernel that actually boots all the way up? > I have debugged this a bit more. It seems that kernel getting stalled > while executing on TC's of second VPE . > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > by 1, t=2504 jiffies) > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > by 1, t=10036 jiffies) > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > by 1, t=17568 jiffies) > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > by 1, t=25100 jiffies) > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > by 1, t=32632 jiffies) > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > by 1, t=40164 jiffies) > > With CONFIG_TREE_CPU we were not hitting this scenario very often. > However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. > > I presume some issue in my timer setup . I am not seeing timer interrupt > (or IPI interrupt) getting incremented for VPE1 tcs on a completely > booted 2.6.32-stable kernel. > > / # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > CPU6 > 1: 148 15023 15140 15093 3779 8 > 2 MIPS SMTC_IPI > 6: 0 0 0 0 0 0 > 0 MIPS MSP CIC cascade > 8: 0 0 0 0 0 0 > 0 MSP_CIC Softreset button > 9: 0 0 0 0 0 0 > 0 MSP_CIC Standby switch > 21: 0 0 0 0 0 0 > 0 MSP_CIC MSP PER cascade > 25: 15113 341 4 7 0 0 > 0 MSP_CIC timer > 27: 260 9 0 1 0 0 > 0 MSP_CIC serial > 34: 0 0 0 0 0 0 > 0 MSP_CIC timer > > Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. > > I have tried setting up VPE1 timer from get_co_compare_int as follows > > unsigned int __cpuinit get_c0_compare_int(void) > { > if ((1==get_current_vpe()) && !vpe1_timr_installed){ > > memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); > > setup_irq(MSP_INT_VPE1_TIMER, &timer_vpe1); > vpe1_timr_installed++; > } > return (get_current_vpe() ? MSP_INT_VPE1_TIMER : > MSP_INT_VPE0_TIMER); > } > > Thanks > Anoop > >> Regards, >> >> Kevin K. >> >> On 1/3/2011 7:12 AM, Anoop P A wrote: >>> Hi , >>> >>> Following patch restricts TREE_CPU RCU implementation only for !PREEMPT >>> SMP kernel. >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 >>> >>> CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel >>> ( which will be only available RCU implementation for SMTC kernel from >>> 2.6.37 onwards) . >>> >>> With no forced preemption and selecting TREE_CPU I am able to boot >>> further to the hang that I have reported. >>> >>> Thanks >>> Anoop >>> >>> On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: >>>> At this point the logical thing to do would seem to look at your kernel >>>> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 >>>> shows the last exception to have been taken. That's a critical SMTC >>>> routine that gets called whenever an xxx_irq_restore() enables >>>> interrupts, so that virtual per-TC IPI interrupts that were posted while >>>> the TC had interrupts disabled can be handled deterministically. As I >>>> mentioned in an earlier message, there was some cleanup work from David >>>> Howell that changed a number of irq management-related function names >>>> and prototypes across all architectures, which went into linux-mips.org >>>> at very roughly the time of the breakage. The SMTC overlay over the irq >>>> implementation has been pretty robust, but it's written in a perhaps >>>> doomed attempt to be both efficient and using a maximum amount of common >>>> code with the general case. A mechanical or semi-mechanical change >>>> could conceivably have broken things. >>>> >>>> Regards, >>>> >>>> Kevin K. >>>> >>>> >>>> On 12/31/2010 4:27 AM, Anoop P A wrote: >>>>> Hi , >>>>> >>>>> Kernel hangs on stop_machine call. Please find mt reg dump below. >>>>> Another important observation is even though 2.6.33 kernel + stackframe >>>>> patch well passes calibration hang , I am still unable boot in to a >>>>> initramfs root ( verified ramfs working with VSMP). So it looks like >>>>> still some issue to fix between 2.6.32 and 2.6.33 . >>>>> ######################## Log ########################### >>>>> >>>>> === MIPS MT State Dump === >>>>> -- Global State -- >>>>> MVPControl Passed: 00000005 >>>>> MVPControl Read: 00000004 >>>>> MVPConf0 : a8008406 >>>>> -- per-VPE State -- >>>>> VPE 0 >>>>> VPEControl : 00008000 >>>>> VPEConf0 : 800f0003 >>>>> VPE0.Status : 11004201 >>>>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 >>>>> VPE0.Cause : 50804000 >>>>> VPE0.Config7 : 00010000 >>>>> VPE 1 >>>>> VPEControl : 00068006 >>>>> VPEConf0 : 80cf0003 >>>>> VPE1.Status : 11008301 >>>>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 >>>>> VPE1.Cause : 50800000 >>>>> VPE1.Config7 : 00010000 >>>>> -- per-TC State -- >>>>> TC 0 (current TC with VPE EPC above) >>>>> TCStatus : 18102000 >>>>> TCBind : 00000000 >>>>> TCRestart : 803fa19c printk+0xc/0x30 >>>>> TCHalt : 00000000 >>>>> TCContext : 00000000 >>>>> TC 1 >>>>> TCStatus : 18902000 >>>>> TCBind : 00200000 >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>>>> TCHalt : 00000000 >>>>> TCContext : 00140000 >>>>> TC 2 >>>>> TCStatus : 18902000 >>>>> TCBind : 00400000 >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>>>> TCHalt : 00000000 >>>>> TCContext : 00280000 >>>>> TC 3 >>>>> TCStatus : 18902000 >>>>> TCBind : 00600000 >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>>>> TCHalt : 00000000 >>>>> TCContext : 003c0000 >>>>> TC 4 >>>>> TCStatus : 18902000 >>>>> TCBind : 00800001 >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>>>> TCHalt : 00000000 >>>>> TCContext : 00500000 >>>>> TC 5 >>>>> TCStatus : 18902000 >>>>> TCBind : 00a00001 >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>>>> TCHalt : 00000000 >>>>> TCContext : 00640000 >>>>> TC 6 >>>>> TCStatus : 18902000 >>>>> TCBind : 00c00001 >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>>>> TCHalt : 00000000 >>>>> TCContext : 00780000 >>>>> Counter Interrupts taken per CPU (TC) >>>>> 0: 0 >>>>> 1: 0 >>>>> 2: 0 >>>>> 3: 0 >>>>> 4: 0 >>>>> 5: 0 >>>>> 6: 0 >>>>> 7: 0 >>>>> Self-IPI invocations: >>>>> 0: 12 >>>>> 1: 0 >>>>> 2: 0 >>>>> 3: 0 >>>>> 4: 0 >>>>> 5: 5 >>>>> 6: 4 >>>>> 7: 0 >>>>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 >>>>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 >>>>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 >>>>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 >>>>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 >>>>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 >>>>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 >>>>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 >>>>> 0 Recoveries of "stolen" FPU >>>>> =========================== >>>>> >>>>> ################################################################ >>>>> >>>>> Thanks >>>>> Anoop >>>>> >>>>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: >>>>>> I took a quick look last night, and the only thing that looked vaguely >>>>>> dangerous in changes since the timer changes I alluded to earlier was >>>>>> the global naming cleanup of irq-related function names that David >>>>>> Howell submitted. The diff didn't look dangerous in itself, but some of >>>>>> the definitions are nested subtly for SMTC to maximize the amount of >>>>>> common code, and I could imagine something getting lost in translation >>>>>> there. If that were really the problem, it would of course affect much >>>>>> more than just the timer subsystem, but early in the boot process, >>>>>> timers are pretty much the only interrupts that have to be handled >>>>>> correctly. >>>>>> >>>>>> I'm travelling today, but will take a look at timekeeping_notify() >>>>>> tomorrow or the next day... >>>>>> >>>>>> /K. >>>>>> >>>>>> On 12/28/10 12:19 AM, Anoop P A wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I had a glance into the code diff without notice of any suspect-able >>>>>>> code . >>>>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify >>>>>>> function. >>>>>>> >>>>>>> Thanks, >>>>>>> Anoop >>>>>>> >>>>>>> PS: I may not be available until Thursday >>>>>>> >>>>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: >>>>>>>> Hi Kevin, >>>>>>>> >>>>>>>> It is very unlikely that the patch you pointed has any impact on the the >>>>>>>> hang I am seeing. The patch you have mentioned got into kernel around >>>>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + >>>>>>>> stackframe patch) . >>>>>>>> >>>>>>>> Hi Stuart, >>>>>>>> >>>>>>>> I haven't got much time to spend on this today. >>>>>>>> >>>>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have >>>>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) >>>>>>>> >>>>>>>> So probably some patches in 2.6.37 branch introduced this hang. >>>>>>>> >>>>>>>> Hopefully I will get some free slot tomorrow so that I can look into >>>>>>>> code diff . >>>>>>>> >>>>>>>> Thanks >>>>>>>> Anoop >>>>>>>> >>>>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: >>>>>>>>> Kevin, >>>>>>>>> >>>>>>>>> Outstanding, sometimes it's better to be lucky than good. >>>>>>>>> >>>>>>>>> >>>>>>>>> Anoop, >>>>>>>>> >>>>>>>>> Maybe we can get lucky again. >>>>>>>>> >>>>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, >>>>>>>>> I'll be happy to do another diff. >>>>>>>>> >>>>>>>>> >>>>>>>>> Hope you'll have had a good Christmas as well. >>>>>>>>> We've had snow in Alabama since Christmas eve! >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Stuart >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] >>>>>>>>> Sent: Friday, December 24, 2010 5:34 PM >>>>>>>>> To: Anoop P A >>>>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org >>>>>>>>> Subject: Re: SMTC support status in latest git head. >>>>>>>>> >>>>>>>>> >>>>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's >>>>>>>>> performance tweak for the deeper pipelined processors. In looking for >>>>>>>>> this, I did notice that someone did some modification to the SMTC clock >>>>>>>>> tick logic that I was skeptical had ever been tested. If you've still >>>>>>>>> got that kernel binary handy, you might check to see if it boots with >>>>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. >>>>>>>>> >>>>>>>>> Oh, yes, and Merry Christmas one and all! >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Kevin K. >>>>>>>>> >>>>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote: >>>>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >>>>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >>>>>>>>>>> fix things, while preserving the other fixes and performance enhancements? >>>>>>>>>>> >>>>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration >>>>>>>>>> loop but hangs after switching to mips closource >>>>>>>>>> >>>>>>>>>> TC 6 going on-line as CPU 6 >>>>>>>>>> Brought up 7 CPUs >>>>>>>>>> bio: create slab<bio-0> at 0 >>>>>>>>>> SCSI subsystem initialized >>>>>>>>>> Switching to clocksource MIPS >>>>>>>>>> >>>>>>>>>> I Presume this is a different issue as restoring older file didn't help >>>>>>>>>> much to get rid of this hang. >>>>>>>>>> >>>>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h >>>>>>>>>> b/arch/mips/include/asm/stackframe.h >>>>>>>>>> index 58730c5..7fc9f10 100644 >>>>>>>>>> --- a/arch/mips/include/asm/stackframe.h >>>>>>>>>> +++ b/arch/mips/include/asm/stackframe.h >>>>>>>>>> @@ -195,9 +195,9 @@ >>>>>>>>>> * to cover the pipeline delay. >>>>>>>>>> */ >>>>>>>>>> .set mips32 >>>>>>>>>> - mfc0 v1, CP0_TCSTATUS >>>>>>>>>> + mfc0 v0, CP0_TCSTATUS >>>>>>>>>> .set mips0 >>>>>>>>>> - LONG_S v1, PT_TCSTATUS(sp) >>>>>>>>>> + LONG_S v0, PT_TCSTATUS(sp) >>>>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ >>>>>>>>>> LONG_S $4, PT_R4(sp) >>>>>>>>>> LONG_S $5, PT_R5(sp) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> /K. >>>>>>>>>>> >>>>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: >>>>>>>>>>>> Hi Kevin, Stuart , >>>>>>>>>>>> >>>>>>>>>>>> Woohooo You guys spotted !. >>>>>>>>>>>> >>>>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>>>>>>>>>>> the culprit >>>>>>>>>>>> >>>>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started >>>>>>>>>>>> booting !. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Anoop >>>>>>>>>>>> >>>>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a >>>>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the >>>>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>>>>>>>>>>> clobbered before it gets stored. This will eventually result in the >>>>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common, >>>>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>>>>>>>>>>> >>>>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS >>>>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance >>>>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 >>>>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>>>>>>>>>>> lean toward the second option, but I'm not in a position to test and >>>>>>>>>>>>> submit a patch just now. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Kevin K. >>>>>>>>>>>>> >>>>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>>>>>>>>>>> Kevin, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm not sure if it's useful, >>>>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>>>>>>>>>>> works 2.6.32-stable with patch 804 >>>>>>>>>>>>>> works_not 2.6.33-stable >>>>>>>>>>>>>> >>>>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> arch/mips/include/asm/irq.h >>>>>>>>>>>>>> arch/mips/kernel/irq.c >>>>>>>>>>>>>> do_IRQ >>>>>>>>>>>>>> >>>>>>>>>>>>>> arch/mips/include/asm/stackframe.h >>>>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>>>>>>>>>>> >>>>>>>>>>>>>> arch/mips/include/asm/time.h >>>>>>>>>>>>>> clocksource_set_clock >>>>>>>>>>>>>> >>>>>>>>>>>>>> arch/mips/kernel/process.c >>>>>>>>>>>>>> cpu_idle >>>>>>>>>>>>>> >>>>>>>>>>>>>> arch/mips/kernel/smtc.c >>>>>>>>>>>>>> __irq_entry >>>>>>>>>>>>>> ipi_decode >>>>>>>>>>>>>> SMTC_CLOCK_TICK >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'll try to look in more detail after Christmas. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stuart >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 8:17 ` Kevin D. Kissell @ 2011-01-04 13:02 ` Anoop P A 2011-01-04 14:37 ` Anoop P A 2011-01-04 17:40 ` Kevin D. Kissell 0 siblings, 2 replies; 68+ messages in thread From: Anoop P A @ 2011-01-04 13:02 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Tue, 2011-01-04 at 00:17 -0800, Kevin D. Kissell wrote: > Those interrupt counters show that IPIs are being taken everywhere, > though very few by CPUs 5 and 6. If I understand the configuration > correctly, CPU 4 is a TC in VPE 1, and it's getting a reasonable IPI Yes CPU4 is in second VPE > rate, *if* we're looking at a tickless kernel under low load. But there No it was not the tickless kernel.I had selected 250 MHz timer. can't we expect IPI / timer interrupt for all the threads in this case ?. > may be a clue there to part of your problem. I have no idea why the > behavior would have changed from 2.6.36 to 2.6.37, but it looks as if > you're getting your clock interrupts through the MSP CIC interrupt > controller on VPE 0. There's nothing symmetric for VPE1. The Malta > example code is perhaps deceptively simple, in that both VPEs have their > count/compare indication wired directly to the 2 clock interrupt inputs, > so that having both of them running with only a single set of irq state > just works. I don't know whether the MSP CIC timer interrupt is a In my case it is separate irq. MSP_INT_VPE1_TIMER (34) and MSP_INT_VPE0_TIMER (25) are wired to CIC . CIC interrupt has been connected to cpu irq 6. I can reproduce cpu stall in VSMP mode If I don't setup VPE1 timer interrupt . Don't we have support for separate irq in SMTC implementation ?.. > gating of the VPE0 count/compare output, or whether it's it's own > interval timer, but I suspect that you may need to do some further > low-level initialization in the platform-specific code to set up an > interrupt on the VPE1 side. I don't think the snippet you've got below > would work as written. The routine which I copied works fine for VSMP mode . / # cat /proc/interrupts CPU0 CPU1 0: 187 254 MIPS IPI_resched 1: 77 174 MIPS IPI_call 6: 0 0 MIPS MSP CIC cascade 8: 0 0 MSP_CIC Softreset button 9: 0 0 MSP_CIC Standby switch 21: 0 0 MSP_CIC MSP PER cascade 25: 37077 0 MSP_CIC timer 27: 188 0 MSP_CIC serial 34: 0 36986 MSP_CIC timer Do I want to change anything specific for SMTC ? . > > If it's purely an issue with clock distribution on VPE1, then a boot > with maxvpes=1 maxtcs=4 should be stable. Yes the kernel seems to be stable if I boot with maxvpes=1 maxtcs=4 . > > /K. > > On 1/3/2011 11:20 AM, Anoop P A wrote: > > Hi Kevin, > > > > On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: > >> The very first SMTC implementations didn't support full kernel-mode > >> preemption, which anyway wasn't a priority, given the hardware event > >> response support in MIPS MT. I believe it was later made compatible, > >> but it was never extensively exercised. Since SMTC has fingers in some > >> pretty low-level atomicity mechanisms, if a new, parallel set was > >> implemented for RCU, I can easily imagine that nobody has yet > >> implemented SMTC-ified variants of that set. > >> > >> Your last statement isn't very clear, though. Are you saying that if > >> you configure for no forced preemption and with TREE_CPU, the 2.6.37 > >> kernel boots all the way up, or that it simply hangs later? What's the > >> last rev kernel that actually boots all the way up? > > I have debugged this a bit more. It seems that kernel getting stalled > > while executing on TC's of second VPE . > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > by 1, t=2504 jiffies) > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > by 1, t=10036 jiffies) > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > by 1, t=17568 jiffies) > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > by 1, t=25100 jiffies) > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > by 1, t=32632 jiffies) > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > by 1, t=40164 jiffies) > > > > With CONFIG_TREE_CPU we were not hitting this scenario very often. > > However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. > > > > I presume some issue in my timer setup . I am not seeing timer interrupt > > (or IPI interrupt) getting incremented for VPE1 tcs on a completely > > booted 2.6.32-stable kernel. > > > > / # cat /proc/interrupts > > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > > CPU6 > > 1: 148 15023 15140 15093 3779 8 > > 2 MIPS SMTC_IPI > > 6: 0 0 0 0 0 0 > > 0 MIPS MSP CIC cascade > > 8: 0 0 0 0 0 0 > > 0 MSP_CIC Softreset button > > 9: 0 0 0 0 0 0 > > 0 MSP_CIC Standby switch > > 21: 0 0 0 0 0 0 > > 0 MSP_CIC MSP PER cascade > > 25: 15113 341 4 7 0 0 > > 0 MSP_CIC timer > > 27: 260 9 0 1 0 0 > > 0 MSP_CIC serial > > 34: 0 0 0 0 0 0 > > 0 MSP_CIC timer > > > > Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. > > > > I have tried setting up VPE1 timer from get_co_compare_int as follows > > > > unsigned int __cpuinit get_c0_compare_int(void) > > { > > if ((1==get_current_vpe()) && !vpe1_timr_installed){ > > > > memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); > > > > setup_irq(MSP_INT_VPE1_TIMER, &timer_vpe1); > > vpe1_timr_installed++; > > } > > return (get_current_vpe() ? MSP_INT_VPE1_TIMER : > > MSP_INT_VPE0_TIMER); > > } > > > > Thanks > > Anoop > > > >> Regards, > >> > >> Kevin K. > >> > >> On 1/3/2011 7:12 AM, Anoop P A wrote: > >>> Hi , > >>> > >>> Following patch restricts TREE_CPU RCU implementation only for !PREEMPT > >>> SMP kernel. > >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 > >>> > >>> CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel > >>> ( which will be only available RCU implementation for SMTC kernel from > >>> 2.6.37 onwards) . > >>> > >>> With no forced preemption and selecting TREE_CPU I am able to boot > >>> further to the hang that I have reported. > >>> > >>> Thanks > >>> Anoop > >>> > >>> On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: > >>>> At this point the logical thing to do would seem to look at your kernel > >>>> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 > >>>> shows the last exception to have been taken. That's a critical SMTC > >>>> routine that gets called whenever an xxx_irq_restore() enables > >>>> interrupts, so that virtual per-TC IPI interrupts that were posted while > >>>> the TC had interrupts disabled can be handled deterministically. As I > >>>> mentioned in an earlier message, there was some cleanup work from David > >>>> Howell that changed a number of irq management-related function names > >>>> and prototypes across all architectures, which went into linux-mips.org > >>>> at very roughly the time of the breakage. The SMTC overlay over the irq > >>>> implementation has been pretty robust, but it's written in a perhaps > >>>> doomed attempt to be both efficient and using a maximum amount of common > >>>> code with the general case. A mechanical or semi-mechanical change > >>>> could conceivably have broken things. > >>>> > >>>> Regards, > >>>> > >>>> Kevin K. > >>>> > >>>> > >>>> On 12/31/2010 4:27 AM, Anoop P A wrote: > >>>>> Hi , > >>>>> > >>>>> Kernel hangs on stop_machine call. Please find mt reg dump below. > >>>>> Another important observation is even though 2.6.33 kernel + stackframe > >>>>> patch well passes calibration hang , I am still unable boot in to a > >>>>> initramfs root ( verified ramfs working with VSMP). So it looks like > >>>>> still some issue to fix between 2.6.32 and 2.6.33 . > >>>>> ######################## Log ########################### > >>>>> > >>>>> === MIPS MT State Dump === > >>>>> -- Global State -- > >>>>> MVPControl Passed: 00000005 > >>>>> MVPControl Read: 00000004 > >>>>> MVPConf0 : a8008406 > >>>>> -- per-VPE State -- > >>>>> VPE 0 > >>>>> VPEControl : 00008000 > >>>>> VPEConf0 : 800f0003 > >>>>> VPE0.Status : 11004201 > >>>>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 > >>>>> VPE0.Cause : 50804000 > >>>>> VPE0.Config7 : 00010000 > >>>>> VPE 1 > >>>>> VPEControl : 00068006 > >>>>> VPEConf0 : 80cf0003 > >>>>> VPE1.Status : 11008301 > >>>>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 > >>>>> VPE1.Cause : 50800000 > >>>>> VPE1.Config7 : 00010000 > >>>>> -- per-TC State -- > >>>>> TC 0 (current TC with VPE EPC above) > >>>>> TCStatus : 18102000 > >>>>> TCBind : 00000000 > >>>>> TCRestart : 803fa19c printk+0xc/0x30 > >>>>> TCHalt : 00000000 > >>>>> TCContext : 00000000 > >>>>> TC 1 > >>>>> TCStatus : 18902000 > >>>>> TCBind : 00200000 > >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>>>> TCHalt : 00000000 > >>>>> TCContext : 00140000 > >>>>> TC 2 > >>>>> TCStatus : 18902000 > >>>>> TCBind : 00400000 > >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>>>> TCHalt : 00000000 > >>>>> TCContext : 00280000 > >>>>> TC 3 > >>>>> TCStatus : 18902000 > >>>>> TCBind : 00600000 > >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>>>> TCHalt : 00000000 > >>>>> TCContext : 003c0000 > >>>>> TC 4 > >>>>> TCStatus : 18902000 > >>>>> TCBind : 00800001 > >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>>>> TCHalt : 00000000 > >>>>> TCContext : 00500000 > >>>>> TC 5 > >>>>> TCStatus : 18902000 > >>>>> TCBind : 00a00001 > >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>>>> TCHalt : 00000000 > >>>>> TCContext : 00640000 > >>>>> TC 6 > >>>>> TCStatus : 18902000 > >>>>> TCBind : 00c00001 > >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>>>> TCHalt : 00000000 > >>>>> TCContext : 00780000 > >>>>> Counter Interrupts taken per CPU (TC) > >>>>> 0: 0 > >>>>> 1: 0 > >>>>> 2: 0 > >>>>> 3: 0 > >>>>> 4: 0 > >>>>> 5: 0 > >>>>> 6: 0 > >>>>> 7: 0 > >>>>> Self-IPI invocations: > >>>>> 0: 12 > >>>>> 1: 0 > >>>>> 2: 0 > >>>>> 3: 0 > >>>>> 4: 0 > >>>>> 5: 5 > >>>>> 6: 4 > >>>>> 7: 0 > >>>>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > >>>>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > >>>>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > >>>>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > >>>>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > >>>>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > >>>>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > >>>>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > >>>>> 0 Recoveries of "stolen" FPU > >>>>> =========================== > >>>>> > >>>>> ################################################################ > >>>>> > >>>>> Thanks > >>>>> Anoop > >>>>> > >>>>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: > >>>>>> I took a quick look last night, and the only thing that looked vaguely > >>>>>> dangerous in changes since the timer changes I alluded to earlier was > >>>>>> the global naming cleanup of irq-related function names that David > >>>>>> Howell submitted. The diff didn't look dangerous in itself, but some of > >>>>>> the definitions are nested subtly for SMTC to maximize the amount of > >>>>>> common code, and I could imagine something getting lost in translation > >>>>>> there. If that were really the problem, it would of course affect much > >>>>>> more than just the timer subsystem, but early in the boot process, > >>>>>> timers are pretty much the only interrupts that have to be handled > >>>>>> correctly. > >>>>>> > >>>>>> I'm travelling today, but will take a look at timekeeping_notify() > >>>>>> tomorrow or the next day... > >>>>>> > >>>>>> /K. > >>>>>> > >>>>>> On 12/28/10 12:19 AM, Anoop P A wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I had a glance into the code diff without notice of any suspect-able > >>>>>>> code . > >>>>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify > >>>>>>> function. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Anoop > >>>>>>> > >>>>>>> PS: I may not be available until Thursday > >>>>>>> > >>>>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > >>>>>>>> Hi Kevin, > >>>>>>>> > >>>>>>>> It is very unlikely that the patch you pointed has any impact on the the > >>>>>>>> hang I am seeing. The patch you have mentioned got into kernel around > >>>>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > >>>>>>>> stackframe patch) . > >>>>>>>> > >>>>>>>> Hi Stuart, > >>>>>>>> > >>>>>>>> I haven't got much time to spend on this today. > >>>>>>>> > >>>>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > >>>>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > >>>>>>>> > >>>>>>>> So probably some patches in 2.6.37 branch introduced this hang. > >>>>>>>> > >>>>>>>> Hopefully I will get some free slot tomorrow so that I can look into > >>>>>>>> code diff . > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> Anoop > >>>>>>>> > >>>>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > >>>>>>>>> Kevin, > >>>>>>>>> > >>>>>>>>> Outstanding, sometimes it's better to be lucky than good. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Anoop, > >>>>>>>>> > >>>>>>>>> Maybe we can get lucky again. > >>>>>>>>> > >>>>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > >>>>>>>>> I'll be happy to do another diff. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Hope you'll have had a good Christmas as well. > >>>>>>>>> We've had snow in Alabama since Christmas eve! > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> > >>>>>>>>> Stuart > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] > >>>>>>>>> Sent: Friday, December 24, 2010 5:34 PM > >>>>>>>>> To: Anoop P A > >>>>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > >>>>>>>>> Subject: Re: SMTC support status in latest git head. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's > >>>>>>>>> performance tweak for the deeper pipelined processors. In looking for > >>>>>>>>> this, I did notice that someone did some modification to the SMTC clock > >>>>>>>>> tick logic that I was skeptical had ever been tested. If you've still > >>>>>>>>> got that kernel binary handy, you might check to see if it boots with > >>>>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > >>>>>>>>> > >>>>>>>>> Oh, yes, and Merry Christmas one and all! > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> > >>>>>>>>> Kevin K. > >>>>>>>>> > >>>>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote: > >>>>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >>>>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >>>>>>>>>>> fix things, while preserving the other fixes and performance enhancements? > >>>>>>>>>>> > >>>>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration > >>>>>>>>>> loop but hangs after switching to mips closource > >>>>>>>>>> > >>>>>>>>>> TC 6 going on-line as CPU 6 > >>>>>>>>>> Brought up 7 CPUs > >>>>>>>>>> bio: create slab<bio-0> at 0 > >>>>>>>>>> SCSI subsystem initialized > >>>>>>>>>> Switching to clocksource MIPS > >>>>>>>>>> > >>>>>>>>>> I Presume this is a different issue as restoring older file didn't help > >>>>>>>>>> much to get rid of this hang. > >>>>>>>>>> > >>>>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h > >>>>>>>>>> b/arch/mips/include/asm/stackframe.h > >>>>>>>>>> index 58730c5..7fc9f10 100644 > >>>>>>>>>> --- a/arch/mips/include/asm/stackframe.h > >>>>>>>>>> +++ b/arch/mips/include/asm/stackframe.h > >>>>>>>>>> @@ -195,9 +195,9 @@ > >>>>>>>>>> * to cover the pipeline delay. > >>>>>>>>>> */ > >>>>>>>>>> .set mips32 > >>>>>>>>>> - mfc0 v1, CP0_TCSTATUS > >>>>>>>>>> + mfc0 v0, CP0_TCSTATUS > >>>>>>>>>> .set mips0 > >>>>>>>>>> - LONG_S v1, PT_TCSTATUS(sp) > >>>>>>>>>> + LONG_S v0, PT_TCSTATUS(sp) > >>>>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ > >>>>>>>>>> LONG_S $4, PT_R4(sp) > >>>>>>>>>> LONG_S $5, PT_R5(sp) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> /K. > >>>>>>>>>>> > >>>>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: > >>>>>>>>>>>> Hi Kevin, Stuart , > >>>>>>>>>>>> > >>>>>>>>>>>> Woohooo You guys spotted !. > >>>>>>>>>>>> > >>>>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>>>>>>>>>>> the culprit > >>>>>>>>>>>> > >>>>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>>>>>>>>>>> booting !. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Anoop > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a > >>>>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the > >>>>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>>>>>>>>>>> clobbered before it gets stored. This will eventually result in the > >>>>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common, > >>>>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance > >>>>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>>>>>>>>>>> lean toward the second option, but I'm not in a position to test and > >>>>>>>>>>>>> submit a patch just now. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Kevin K. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>>>>>>>>>>> Kevin, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm not sure if it's useful, > >>>>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>>>>>>>>>>> works 2.6.32-stable with patch 804 > >>>>>>>>>>>>>> works_not 2.6.33-stable > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> arch/mips/include/asm/irq.h > >>>>>>>>>>>>>> arch/mips/kernel/irq.c > >>>>>>>>>>>>>> do_IRQ > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> arch/mips/include/asm/stackframe.h > >>>>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> arch/mips/include/asm/time.h > >>>>>>>>>>>>>> clocksource_set_clock > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> arch/mips/kernel/process.c > >>>>>>>>>>>>>> cpu_idle > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> arch/mips/kernel/smtc.c > >>>>>>>>>>>>>> __irq_entry > >>>>>>>>>>>>>> ipi_decode > >>>>>>>>>>>>>> SMTC_CLOCK_TICK > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'll try to look in more detail after Christmas. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Stuart > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 13:02 ` Anoop P A @ 2011-01-04 14:37 ` Anoop P A 2011-01-04 17:21 ` Kevin D. Kissell 2011-01-04 17:40 ` Kevin D. Kissell 1 sibling, 1 reply; 68+ messages in thread From: Anoop P A @ 2011-01-04 14:37 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips Hi Kevin, the stackframe patch that you have suggested had some side effects I was unable execute init. When I changed some thing like below it started working .Could you kindly review it ?. diff --git a/arch/mips/include/asm/stackframe.h b/arch/mips/include/asm/stackframe.h index 58730c5..da786ed 100644 --- a/arch/mips/include/asm/stackframe.h +++ b/arch/mips/include/asm/stackframe.h @@ -181,14 +181,6 @@ #endif LONG_S k0, PT_R29(sp) LONG_S $3, PT_R3(sp) - /* - * You might think that you don't need to save $0, - * but the FPU emulator and gdb remote debug stub - * need it to operate correctly - */ - LONG_S $0, PT_R0(sp) - mfc0 v1, CP0_STATUS - LONG_S $2, PT_R2(sp) #ifdef CONFIG_MIPS_MT_SMTC /* * Ideally, these instructions would be shuffled in @@ -199,6 +191,14 @@ .set mips0 LONG_S v1, PT_TCSTATUS(sp) #endif /* CONFIG_MIPS_MT_SMTC */ + /* + * You might think that you don't need to save $0, + * but the FPU emulator and gdb remote debug stub + * need it to operate correctly + */ + LONG_S $0, PT_R0(sp) + mfc0 v1, CP0_STATUS + LONG_S $2, PT_R2(sp) LONG_S $4, PT_R4(sp) LONG_S $5, PT_R5(sp) LONG_S v1, PT_STATUS(sp) Linux-2.6.37-rc7 boots all the way if I specify maxvpes=1 in command line. / # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 1: 249 218024 218286 218263 218235 218208 218179 MIPS SMTC_IPI 6: 0 0 0 0 0 0 0 MIPS MSP CIC cascade 8: 0 0 0 0 0 0 0 MSP_CIC Softreset button 9: 0 0 0 0 0 0 0 MSP_CIC Standby switch 21: 0 0 0 0 0 0 0 MSP_CIC MSP PER cascade 25: 218128 711 11 0 0 0 0 MSP_CIC timer 27: 341 22 0 0 2 0 6 MSP_CIC serial ERR: 0 / # uname -a Linux (none) 2.6.37-rc7-pmc-00001-g9cff2d6-dirty #289 SMP PREEMPT Tue Jan 4 19:48:31 IST 2011 mips GNU/Linux So clock setup / distribution on VPE1 is some thing need fix. Thanks Anoop On Tue, 2011-01-04 at 18:32 +0530, Anoop P A wrote: > On Tue, 2011-01-04 at 00:17 -0800, Kevin D. Kissell wrote: > > Those interrupt counters show that IPIs are being taken everywhere, > > though very few by CPUs 5 and 6. If I understand the configuration > > correctly, CPU 4 is a TC in VPE 1, and it's getting a reasonable IPI > Yes CPU4 is in second VPE > > > rate, *if* we're looking at a tickless kernel under low load. But there > No it was not the tickless kernel.I had selected 250 MHz timer. can't we > expect IPI / timer interrupt for all the threads in this case ?. > > > may be a clue there to part of your problem. I have no idea why the > > behavior would have changed from 2.6.36 to 2.6.37, but it looks as if > > you're getting your clock interrupts through the MSP CIC interrupt > > controller on VPE 0. There's nothing symmetric for VPE1. The Malta > > example code is perhaps deceptively simple, in that both VPEs have their > > count/compare indication wired directly to the 2 clock interrupt inputs, > > so that having both of them running with only a single set of irq state > > just works. I don't know whether the MSP CIC timer interrupt is a > > In my case it is separate irq. MSP_INT_VPE1_TIMER (34) and > MSP_INT_VPE0_TIMER (25) are wired to CIC . CIC interrupt has been > connected to cpu irq 6. > > I can reproduce cpu stall in VSMP mode If I don't setup VPE1 timer > interrupt . Don't we have support for separate irq in SMTC > implementation ?.. > > > gating of the VPE0 count/compare output, or whether it's it's own > > interval timer, but I suspect that you may need to do some further > > low-level initialization in the platform-specific code to set up an > > interrupt on the VPE1 side. I don't think the snippet you've got below > > would work as written. > > The routine which I copied works fine for VSMP mode . > > / # cat /proc/interrupts > CPU0 CPU1 > 0: 187 254 MIPS IPI_resched > 1: 77 174 MIPS IPI_call > 6: 0 0 MIPS MSP CIC cascade > 8: 0 0 MSP_CIC Softreset button > 9: 0 0 MSP_CIC Standby switch > 21: 0 0 MSP_CIC MSP PER cascade > 25: 37077 0 MSP_CIC timer > 27: 188 0 MSP_CIC serial > 34: 0 36986 MSP_CIC timer > > Do I want to change anything specific for SMTC ? . > > > > > If it's purely an issue with clock distribution on VPE1, then a boot > > with maxvpes=1 maxtcs=4 should be stable. > > Yes the kernel seems to be stable if I boot with maxvpes=1 maxtcs=4 . > > > > > /K. > > > > On 1/3/2011 11:20 AM, Anoop P A wrote: > > > Hi Kevin, > > > > > > On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: > > >> The very first SMTC implementations didn't support full kernel-mode > > >> preemption, which anyway wasn't a priority, given the hardware event > > >> response support in MIPS MT. I believe it was later made compatible, > > >> but it was never extensively exercised. Since SMTC has fingers in some > > >> pretty low-level atomicity mechanisms, if a new, parallel set was > > >> implemented for RCU, I can easily imagine that nobody has yet > > >> implemented SMTC-ified variants of that set. > > >> > > >> Your last statement isn't very clear, though. Are you saying that if > > >> you configure for no forced preemption and with TREE_CPU, the 2.6.37 > > >> kernel boots all the way up, or that it simply hangs later? What's the > > >> last rev kernel that actually boots all the way up? > > > I have debugged this a bit more. It seems that kernel getting stalled > > > while executing on TC's of second VPE . > > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > > by 1, t=2504 jiffies) > > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > > by 1, t=10036 jiffies) > > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > > by 1, t=17568 jiffies) > > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > > by 1, t=25100 jiffies) > > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > > by 1, t=32632 jiffies) > > > INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > > > by 1, t=40164 jiffies) > > > > > > With CONFIG_TREE_CPU we were not hitting this scenario very often. > > > However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. > > > > > > I presume some issue in my timer setup . I am not seeing timer interrupt > > > (or IPI interrupt) getting incremented for VPE1 tcs on a completely > > > booted 2.6.32-stable kernel. > > > > > > / # cat /proc/interrupts > > > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > > > CPU6 > > > 1: 148 15023 15140 15093 3779 8 > > > 2 MIPS SMTC_IPI > > > 6: 0 0 0 0 0 0 > > > 0 MIPS MSP CIC cascade > > > 8: 0 0 0 0 0 0 > > > 0 MSP_CIC Softreset button > > > 9: 0 0 0 0 0 0 > > > 0 MSP_CIC Standby switch > > > 21: 0 0 0 0 0 0 > > > 0 MSP_CIC MSP PER cascade > > > 25: 15113 341 4 7 0 0 > > > 0 MSP_CIC timer > > > 27: 260 9 0 1 0 0 > > > 0 MSP_CIC serial > > > 34: 0 0 0 0 0 0 > > > 0 MSP_CIC timer > > > > > > Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. > > > > > > I have tried setting up VPE1 timer from get_co_compare_int as follows > > > > > > unsigned int __cpuinit get_c0_compare_int(void) > > > { > > > if ((1==get_current_vpe()) && !vpe1_timr_installed){ > > > > > > memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); > > > > > > setup_irq(MSP_INT_VPE1_TIMER, &timer_vpe1); > > > vpe1_timr_installed++; > > > } > > > return (get_current_vpe() ? MSP_INT_VPE1_TIMER : > > > MSP_INT_VPE0_TIMER); > > > } > > > > > > Thanks > > > Anoop > > > > > >> Regards, > > >> > > >> Kevin K. > > >> > > >> On 1/3/2011 7:12 AM, Anoop P A wrote: > > >>> Hi , > > >>> > > >>> Following patch restricts TREE_CPU RCU implementation only for !PREEMPT > > >>> SMP kernel. > > >>> http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 > > >>> > > >>> CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel > > >>> ( which will be only available RCU implementation for SMTC kernel from > > >>> 2.6.37 onwards) . > > >>> > > >>> With no forced preemption and selecting TREE_CPU I am able to boot > > >>> further to the hang that I have reported. > > >>> > > >>> Thanks > > >>> Anoop > > >>> > > >>> On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: > > >>>> At this point the logical thing to do would seem to look at your kernel > > >>>> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 > > >>>> shows the last exception to have been taken. That's a critical SMTC > > >>>> routine that gets called whenever an xxx_irq_restore() enables > > >>>> interrupts, so that virtual per-TC IPI interrupts that were posted while > > >>>> the TC had interrupts disabled can be handled deterministically. As I > > >>>> mentioned in an earlier message, there was some cleanup work from David > > >>>> Howell that changed a number of irq management-related function names > > >>>> and prototypes across all architectures, which went into linux-mips.org > > >>>> at very roughly the time of the breakage. The SMTC overlay over the irq > > >>>> implementation has been pretty robust, but it's written in a perhaps > > >>>> doomed attempt to be both efficient and using a maximum amount of common > > >>>> code with the general case. A mechanical or semi-mechanical change > > >>>> could conceivably have broken things. > > >>>> > > >>>> Regards, > > >>>> > > >>>> Kevin K. > > >>>> > > >>>> > > >>>> On 12/31/2010 4:27 AM, Anoop P A wrote: > > >>>>> Hi , > > >>>>> > > >>>>> Kernel hangs on stop_machine call. Please find mt reg dump below. > > >>>>> Another important observation is even though 2.6.33 kernel + stackframe > > >>>>> patch well passes calibration hang , I am still unable boot in to a > > >>>>> initramfs root ( verified ramfs working with VSMP). So it looks like > > >>>>> still some issue to fix between 2.6.32 and 2.6.33 . > > >>>>> ######################## Log ########################### > > >>>>> > > >>>>> === MIPS MT State Dump === > > >>>>> -- Global State -- > > >>>>> MVPControl Passed: 00000005 > > >>>>> MVPControl Read: 00000004 > > >>>>> MVPConf0 : a8008406 > > >>>>> -- per-VPE State -- > > >>>>> VPE 0 > > >>>>> VPEControl : 00008000 > > >>>>> VPEConf0 : 800f0003 > > >>>>> VPE0.Status : 11004201 > > >>>>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 > > >>>>> VPE0.Cause : 50804000 > > >>>>> VPE0.Config7 : 00010000 > > >>>>> VPE 1 > > >>>>> VPEControl : 00068006 > > >>>>> VPEConf0 : 80cf0003 > > >>>>> VPE1.Status : 11008301 > > >>>>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 > > >>>>> VPE1.Cause : 50800000 > > >>>>> VPE1.Config7 : 00010000 > > >>>>> -- per-TC State -- > > >>>>> TC 0 (current TC with VPE EPC above) > > >>>>> TCStatus : 18102000 > > >>>>> TCBind : 00000000 > > >>>>> TCRestart : 803fa19c printk+0xc/0x30 > > >>>>> TCHalt : 00000000 > > >>>>> TCContext : 00000000 > > >>>>> TC 1 > > >>>>> TCStatus : 18902000 > > >>>>> TCBind : 00200000 > > >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > > >>>>> TCHalt : 00000000 > > >>>>> TCContext : 00140000 > > >>>>> TC 2 > > >>>>> TCStatus : 18902000 > > >>>>> TCBind : 00400000 > > >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > > >>>>> TCHalt : 00000000 > > >>>>> TCContext : 00280000 > > >>>>> TC 3 > > >>>>> TCStatus : 18902000 > > >>>>> TCBind : 00600000 > > >>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > > >>>>> TCHalt : 00000000 > > >>>>> TCContext : 003c0000 > > >>>>> TC 4 > > >>>>> TCStatus : 18902000 > > >>>>> TCBind : 00800001 > > >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > > >>>>> TCHalt : 00000000 > > >>>>> TCContext : 00500000 > > >>>>> TC 5 > > >>>>> TCStatus : 18902000 > > >>>>> TCBind : 00a00001 > > >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > > >>>>> TCHalt : 00000000 > > >>>>> TCContext : 00640000 > > >>>>> TC 6 > > >>>>> TCStatus : 18902000 > > >>>>> TCBind : 00c00001 > > >>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > > >>>>> TCHalt : 00000000 > > >>>>> TCContext : 00780000 > > >>>>> Counter Interrupts taken per CPU (TC) > > >>>>> 0: 0 > > >>>>> 1: 0 > > >>>>> 2: 0 > > >>>>> 3: 0 > > >>>>> 4: 0 > > >>>>> 5: 0 > > >>>>> 6: 0 > > >>>>> 7: 0 > > >>>>> Self-IPI invocations: > > >>>>> 0: 12 > > >>>>> 1: 0 > > >>>>> 2: 0 > > >>>>> 3: 0 > > >>>>> 4: 0 > > >>>>> 5: 5 > > >>>>> 6: 4 > > >>>>> 7: 0 > > >>>>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > > >>>>> 0 Recoveries of "stolen" FPU > > >>>>> =========================== > > >>>>> > > >>>>> ################################################################ > > >>>>> > > >>>>> Thanks > > >>>>> Anoop > > >>>>> > > >>>>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: > > >>>>>> I took a quick look last night, and the only thing that looked vaguely > > >>>>>> dangerous in changes since the timer changes I alluded to earlier was > > >>>>>> the global naming cleanup of irq-related function names that David > > >>>>>> Howell submitted. The diff didn't look dangerous in itself, but some of > > >>>>>> the definitions are nested subtly for SMTC to maximize the amount of > > >>>>>> common code, and I could imagine something getting lost in translation > > >>>>>> there. If that were really the problem, it would of course affect much > > >>>>>> more than just the timer subsystem, but early in the boot process, > > >>>>>> timers are pretty much the only interrupts that have to be handled > > >>>>>> correctly. > > >>>>>> > > >>>>>> I'm travelling today, but will take a look at timekeeping_notify() > > >>>>>> tomorrow or the next day... > > >>>>>> > > >>>>>> /K. > > >>>>>> > > >>>>>> On 12/28/10 12:19 AM, Anoop P A wrote: > > >>>>>>> Hi, > > >>>>>>> > > >>>>>>> I had a glance into the code diff without notice of any suspect-able > > >>>>>>> code . > > >>>>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify > > >>>>>>> function. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> Anoop > > >>>>>>> > > >>>>>>> PS: I may not be available until Thursday > > >>>>>>> > > >>>>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > > >>>>>>>> Hi Kevin, > > >>>>>>>> > > >>>>>>>> It is very unlikely that the patch you pointed has any impact on the the > > >>>>>>>> hang I am seeing. The patch you have mentioned got into kernel around > > >>>>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > > >>>>>>>> stackframe patch) . > > >>>>>>>> > > >>>>>>>> Hi Stuart, > > >>>>>>>> > > >>>>>>>> I haven't got much time to spend on this today. > > >>>>>>>> > > >>>>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > > >>>>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > > >>>>>>>> > > >>>>>>>> So probably some patches in 2.6.37 branch introduced this hang. > > >>>>>>>> > > >>>>>>>> Hopefully I will get some free slot tomorrow so that I can look into > > >>>>>>>> code diff . > > >>>>>>>> > > >>>>>>>> Thanks > > >>>>>>>> Anoop > > >>>>>>>> > > >>>>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > > >>>>>>>>> Kevin, > > >>>>>>>>> > > >>>>>>>>> Outstanding, sometimes it's better to be lucky than good. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Anoop, > > >>>>>>>>> > > >>>>>>>>> Maybe we can get lucky again. > > >>>>>>>>> > > >>>>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > > >>>>>>>>> I'll be happy to do another diff. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Hope you'll have had a good Christmas as well. > > >>>>>>>>> We've had snow in Alabama since Christmas eve! > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Regards, > > >>>>>>>>> > > >>>>>>>>> Stuart > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -----Original Message----- > > >>>>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] > > >>>>>>>>> Sent: Friday, December 24, 2010 5:34 PM > > >>>>>>>>> To: Anoop P A > > >>>>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > > >>>>>>>>> Subject: Re: SMTC support status in latest git head. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's > > >>>>>>>>> performance tweak for the deeper pipelined processors. In looking for > > >>>>>>>>> this, I did notice that someone did some modification to the SMTC clock > > >>>>>>>>> tick logic that I was skeptical had ever been tested. If you've still > > >>>>>>>>> got that kernel binary handy, you might check to see if it boots with > > >>>>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > > >>>>>>>>> > > >>>>>>>>> Oh, yes, and Merry Christmas one and all! > > >>>>>>>>> > > >>>>>>>>> Regards, > > >>>>>>>>> > > >>>>>>>>> Kevin K. > > >>>>>>>>> > > >>>>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote: > > >>>>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > > >>>>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > > >>>>>>>>>>> fix things, while preserving the other fixes and performance enhancements? > > >>>>>>>>>>> > > >>>>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration > > >>>>>>>>>> loop but hangs after switching to mips closource > > >>>>>>>>>> > > >>>>>>>>>> TC 6 going on-line as CPU 6 > > >>>>>>>>>> Brought up 7 CPUs > > >>>>>>>>>> bio: create slab<bio-0> at 0 > > >>>>>>>>>> SCSI subsystem initialized > > >>>>>>>>>> Switching to clocksource MIPS > > >>>>>>>>>> > > >>>>>>>>>> I Presume this is a different issue as restoring older file didn't help > > >>>>>>>>>> much to get rid of this hang. > > >>>>>>>>>> > > >>>>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h > > >>>>>>>>>> b/arch/mips/include/asm/stackframe.h > > >>>>>>>>>> index 58730c5..7fc9f10 100644 > > >>>>>>>>>> --- a/arch/mips/include/asm/stackframe.h > > >>>>>>>>>> +++ b/arch/mips/include/asm/stackframe.h > > >>>>>>>>>> @@ -195,9 +195,9 @@ > > >>>>>>>>>> * to cover the pipeline delay. > > >>>>>>>>>> */ > > >>>>>>>>>> .set mips32 > > >>>>>>>>>> - mfc0 v1, CP0_TCSTATUS > > >>>>>>>>>> + mfc0 v0, CP0_TCSTATUS > > >>>>>>>>>> .set mips0 > > >>>>>>>>>> - LONG_S v1, PT_TCSTATUS(sp) > > >>>>>>>>>> + LONG_S v0, PT_TCSTATUS(sp) > > >>>>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ > > >>>>>>>>>> LONG_S $4, PT_R4(sp) > > >>>>>>>>>> LONG_S $5, PT_R5(sp) > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> /K. > > >>>>>>>>>>> > > >>>>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: > > >>>>>>>>>>>> Hi Kevin, Stuart , > > >>>>>>>>>>>> > > >>>>>>>>>>>> Woohooo You guys spotted !. > > >>>>>>>>>>>> > > >>>>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > > >>>>>>>>>>>> the culprit > > >>>>>>>>>>>> > > >>>>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started > > >>>>>>>>>>>> booting !. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks, > > >>>>>>>>>>>> Anoop > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > > >>>>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > > >>>>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > > >>>>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > > >>>>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a > > >>>>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of > > >>>>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the > > >>>>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets > > >>>>>>>>>>>>> clobbered before it gets stored. This will eventually result in the > > >>>>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common, > > >>>>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS > > >>>>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance > > >>>>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 > > >>>>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > > >>>>>>>>>>>>> lean toward the second option, but I'm not in a position to test and > > >>>>>>>>>>>>> submit a patch just now. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Regards, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Kevin K. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > > >>>>>>>>>>>>>> Kevin, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I'm not sure if it's useful, > > >>>>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > > >>>>>>>>>>>>>> works 2.6.32-stable with patch 804 > > >>>>>>>>>>>>>> works_not 2.6.33-stable > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC > > >>>>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> arch/mips/include/asm/irq.h > > >>>>>>>>>>>>>> arch/mips/kernel/irq.c > > >>>>>>>>>>>>>> do_IRQ > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> arch/mips/include/asm/stackframe.h > > >>>>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> arch/mips/include/asm/time.h > > >>>>>>>>>>>>>> clocksource_set_clock > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> arch/mips/kernel/process.c > > >>>>>>>>>>>>>> cpu_idle > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> arch/mips/kernel/smtc.c > > >>>>>>>>>>>>>> __irq_entry > > >>>>>>>>>>>>>> ipi_decode > > >>>>>>>>>>>>>> SMTC_CLOCK_TICK > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I'll try to look in more detail after Christmas. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Stuart > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > > > > > ^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 14:37 ` Anoop P A @ 2011-01-04 17:21 ` Kevin D. Kissell 2011-01-04 17:54 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-04 17:21 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips I'm trying to figure out a reason why your change below should help, and offhand, modulo tool bugs, I don't see it. I'm assuming that your diff below is a diff relative to the pre-patch stackframe.h. I wouldn't bless it as an alternative because it moves code and comments unnecessarily - all you should really have to do is to move the 190 mfc0 v1, CP0_STATUS 191 LONG_S $2, PT_R2(sp) to be just after the #endif /* CONFIG_MIPS_MT_SMTC */ at around line 201. If moving the save of zero to PT_R0(sp) actually makes a difference, it's evidence that you've got problems in your toolchain (or, heaven forbid, your pipeline)! But I'd really like to see what your assembler is doing to the original patch for it to be broken. Assembler instruction reordering is armed, but it ought not to move register moves and stores around in ways where your sequence 197 .set mips32 198 mfc0 v1, CP0_TCSTATUS 199 .set mips0 200 LONG_S v1, PT_TCSTATUS(sp) 189 LONG_S $0, PT_R0(sp) 190 mfc0 v1, CP0_STATUS 191 LONG_S $2, PT_R2(sp) 202 LONG_S $4, PT_R4(sp) 203 LONG_S $5, PT_R5(sp) 204 LONG_S v1, PT_STATUS(sp) to work while 189 LONG_S $0, PT_R0(sp) 190 mfc0 v1, CP0_STATUS 191 LONG_S $2, PT_R2(sp) 197 .set mips32 198 mfc0 v0, CP0_TCSTATUS 199 .set mips0 200 LONG_S v0, PT_TCSTATUS(sp) 202 LONG_S $4, PT_R4(sp) 203 LONG_S $5, PT_R5(sp) 204 LONG_S v1, PT_STATUS(sp) does not, provided that the identity of v0=$2, v1=$3 is respected. One thing that does stick out as being different - though, again, I'd need to see the disassembly of an instance of the macro to know what it could have done - is that the SMTC conditiona code brackets the mfc0 of TCStatus with .set mips32/.set mips0. Given that the code no longer has a .set mips0 early in the macro, it would be more correct to make it: .set push .set mips32 mfc0 v0, CP0_TCSTATUS (or v1, if we move the mfc0 v1,CP0_STATUS) .set pop and presumably make a similar chage for the block from line 334 to 429. But I don't see any causal path from that funniness to failure. Regards, Kevin K. On 01/04/11 06:37, Anoop P A wrote: > Hi Kevin, > > the stackframe patch that you have suggested had some side effects I was > unable execute init. When I changed some thing like below it started > working .Could you kindly review it ?. > > diff --git a/arch/mips/include/asm/stackframe.h > b/arch/mips/include/asm/stackframe.h > index 58730c5..da786ed 100644 > --- a/arch/mips/include/asm/stackframe.h > +++ b/arch/mips/include/asm/stackframe.h > @@ -181,14 +181,6 @@ > #endif > LONG_S k0, PT_R29(sp) > LONG_S $3, PT_R3(sp) > - /* > - * You might think that you don't need to save $0, > - * but the FPU emulator and gdb remote debug stub > - * need it to operate correctly > - */ > - LONG_S $0, PT_R0(sp) > - mfc0 v1, CP0_STATUS > - LONG_S $2, PT_R2(sp) > #ifdef CONFIG_MIPS_MT_SMTC > /* > * Ideally, these instructions would be shuffled in > @@ -199,6 +191,14 @@ > .set mips0 > LONG_S v1, PT_TCSTATUS(sp) > #endif /* CONFIG_MIPS_MT_SMTC */ > + /* > + * You might think that you don't need to save $0, > + * but the FPU emulator and gdb remote debug stub > + * need it to operate correctly > + */ > + LONG_S $0, PT_R0(sp) > + mfc0 v1, CP0_STATUS > + LONG_S $2, PT_R2(sp) > LONG_S $4, PT_R4(sp) > LONG_S $5, PT_R5(sp) > LONG_S v1, PT_STATUS(sp) > > Linux-2.6.37-rc7 boots all the way if I specify maxvpes=1 in command > line. > > / # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > CPU6 > 1: 249 218024 218286 218263 218235 218208 > 218179 MIPS SMTC_IPI > 6: 0 0 0 0 0 0 > 0 MIPS MSP CIC cascade > 8: 0 0 0 0 0 0 > 0 MSP_CIC Softreset button > 9: 0 0 0 0 0 0 > 0 MSP_CIC Standby switch > 21: 0 0 0 0 0 0 > 0 MSP_CIC MSP PER cascade > 25: 218128 711 11 0 0 0 > 0 MSP_CIC timer > 27: 341 22 0 0 2 0 > 6 MSP_CIC serial > > ERR: 0 > / # uname -a > Linux (none) 2.6.37-rc7-pmc-00001-g9cff2d6-dirty #289 SMP PREEMPT Tue > Jan 4 19:48:31 IST 2011 mips GNU/Linux > > So clock setup / distribution on VPE1 is some thing need fix. > > Thanks > Anoop > > > On Tue, 2011-01-04 at 18:32 +0530, Anoop P A wrote: >> On Tue, 2011-01-04 at 00:17 -0800, Kevin D. Kissell wrote: >>> Those interrupt counters show that IPIs are being taken everywhere, >>> though very few by CPUs 5 and 6. If I understand the configuration >>> correctly, CPU 4 is a TC in VPE 1, and it's getting a reasonable IPI >> Yes CPU4 is in second VPE >> >>> rate, *if* we're looking at a tickless kernel under low load. But there >> No it was not the tickless kernel.I had selected 250 MHz timer. can't we >> expect IPI / timer interrupt for all the threads in this case ?. >> >>> may be a clue there to part of your problem. I have no idea why the >>> behavior would have changed from 2.6.36 to 2.6.37, but it looks as if >>> you're getting your clock interrupts through the MSP CIC interrupt >>> controller on VPE 0. There's nothing symmetric for VPE1. The Malta >>> example code is perhaps deceptively simple, in that both VPEs have their >>> count/compare indication wired directly to the 2 clock interrupt inputs, >>> so that having both of them running with only a single set of irq state >>> just works. I don't know whether the MSP CIC timer interrupt is a >> In my case it is separate irq. MSP_INT_VPE1_TIMER (34) and >> MSP_INT_VPE0_TIMER (25) are wired to CIC . CIC interrupt has been >> connected to cpu irq 6. >> >> I can reproduce cpu stall in VSMP mode If I don't setup VPE1 timer >> interrupt . Don't we have support for separate irq in SMTC >> implementation ?.. >> >>> gating of the VPE0 count/compare output, or whether it's it's own >>> interval timer, but I suspect that you may need to do some further >>> low-level initialization in the platform-specific code to set up an >>> interrupt on the VPE1 side. I don't think the snippet you've got below >>> would work as written. >> The routine which I copied works fine for VSMP mode . >> >> / # cat /proc/interrupts >> CPU0 CPU1 >> 0: 187 254 MIPS IPI_resched >> 1: 77 174 MIPS IPI_call >> 6: 0 0 MIPS MSP CIC cascade >> 8: 0 0 MSP_CIC Softreset button >> 9: 0 0 MSP_CIC Standby switch >> 21: 0 0 MSP_CIC MSP PER cascade >> 25: 37077 0 MSP_CIC timer >> 27: 188 0 MSP_CIC serial >> 34: 0 36986 MSP_CIC timer >> >> Do I want to change anything specific for SMTC ? . >> >>> If it's purely an issue with clock distribution on VPE1, then a boot >>> with maxvpes=1 maxtcs=4 should be stable. >> Yes the kernel seems to be stable if I boot with maxvpes=1 maxtcs=4 . >> >>> /K. >>> >>> On 1/3/2011 11:20 AM, Anoop P A wrote: >>>> Hi Kevin, >>>> >>>> On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: >>>>> The very first SMTC implementations didn't support full kernel-mode >>>>> preemption, which anyway wasn't a priority, given the hardware event >>>>> response support in MIPS MT. I believe it was later made compatible, >>>>> but it was never extensively exercised. Since SMTC has fingers in some >>>>> pretty low-level atomicity mechanisms, if a new, parallel set was >>>>> implemented for RCU, I can easily imagine that nobody has yet >>>>> implemented SMTC-ified variants of that set. >>>>> >>>>> Your last statement isn't very clear, though. Are you saying that if >>>>> you configure for no forced preemption and with TREE_CPU, the 2.6.37 >>>>> kernel boots all the way up, or that it simply hangs later? What's the >>>>> last rev kernel that actually boots all the way up? >>>> I have debugged this a bit more. It seems that kernel getting stalled >>>> while executing on TC's of second VPE . >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>>> by 1, t=2504 jiffies) >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>>> by 1, t=10036 jiffies) >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>>> by 1, t=17568 jiffies) >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>>> by 1, t=25100 jiffies) >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>>> by 1, t=32632 jiffies) >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>>> by 1, t=40164 jiffies) >>>> >>>> With CONFIG_TREE_CPU we were not hitting this scenario very often. >>>> However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. >>>> >>>> I presume some issue in my timer setup . I am not seeing timer interrupt >>>> (or IPI interrupt) getting incremented for VPE1 tcs on a completely >>>> booted 2.6.32-stable kernel. >>>> >>>> / # cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 >>>> CPU6 >>>> 1: 148 15023 15140 15093 3779 8 >>>> 2 MIPS SMTC_IPI >>>> 6: 0 0 0 0 0 0 >>>> 0 MIPS MSP CIC cascade >>>> 8: 0 0 0 0 0 0 >>>> 0 MSP_CIC Softreset button >>>> 9: 0 0 0 0 0 0 >>>> 0 MSP_CIC Standby switch >>>> 21: 0 0 0 0 0 0 >>>> 0 MSP_CIC MSP PER cascade >>>> 25: 15113 341 4 7 0 0 >>>> 0 MSP_CIC timer >>>> 27: 260 9 0 1 0 0 >>>> 0 MSP_CIC serial >>>> 34: 0 0 0 0 0 0 >>>> 0 MSP_CIC timer >>>> >>>> Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. >>>> >>>> I have tried setting up VPE1 timer from get_co_compare_int as follows >>>> >>>> unsigned int __cpuinit get_c0_compare_int(void) >>>> { >>>> if ((1==get_current_vpe())&& !vpe1_timr_installed){ >>>> >>>> memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); >>>> >>>> setup_irq(MSP_INT_VPE1_TIMER,&timer_vpe1); >>>> vpe1_timr_installed++; >>>> } >>>> return (get_current_vpe() ? MSP_INT_VPE1_TIMER : >>>> MSP_INT_VPE0_TIMER); >>>> } >>>> >>>> Thanks >>>> Anoop >>>> >>>>> Regards, >>>>> >>>>> Kevin K. >>>>> >>>>> On 1/3/2011 7:12 AM, Anoop P A wrote: >>>>>> Hi , >>>>>> >>>>>> Following patch restricts TREE_CPU RCU implementation only for !PREEMPT >>>>>> SMP kernel. >>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 >>>>>> >>>>>> CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel >>>>>> ( which will be only available RCU implementation for SMTC kernel from >>>>>> 2.6.37 onwards) . >>>>>> >>>>>> With no forced preemption and selecting TREE_CPU I am able to boot >>>>>> further to the hang that I have reported. >>>>>> >>>>>> Thanks >>>>>> Anoop >>>>>> >>>>>> On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: >>>>>>> At this point the logical thing to do would seem to look at your kernel >>>>>>> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 >>>>>>> shows the last exception to have been taken. That's a critical SMTC >>>>>>> routine that gets called whenever an xxx_irq_restore() enables >>>>>>> interrupts, so that virtual per-TC IPI interrupts that were posted while >>>>>>> the TC had interrupts disabled can be handled deterministically. As I >>>>>>> mentioned in an earlier message, there was some cleanup work from David >>>>>>> Howell that changed a number of irq management-related function names >>>>>>> and prototypes across all architectures, which went into linux-mips.org >>>>>>> at very roughly the time of the breakage. The SMTC overlay over the irq >>>>>>> implementation has been pretty robust, but it's written in a perhaps >>>>>>> doomed attempt to be both efficient and using a maximum amount of common >>>>>>> code with the general case. A mechanical or semi-mechanical change >>>>>>> could conceivably have broken things. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Kevin K. >>>>>>> >>>>>>> >>>>>>> On 12/31/2010 4:27 AM, Anoop P A wrote: >>>>>>>> Hi , >>>>>>>> >>>>>>>> Kernel hangs on stop_machine call. Please find mt reg dump below. >>>>>>>> Another important observation is even though 2.6.33 kernel + stackframe >>>>>>>> patch well passes calibration hang , I am still unable boot in to a >>>>>>>> initramfs root ( verified ramfs working with VSMP). So it looks like >>>>>>>> still some issue to fix between 2.6.32 and 2.6.33 . >>>>>>>> ######################## Log ########################### >>>>>>>> >>>>>>>> === MIPS MT State Dump === >>>>>>>> -- Global State -- >>>>>>>> MVPControl Passed: 00000005 >>>>>>>> MVPControl Read: 00000004 >>>>>>>> MVPConf0 : a8008406 >>>>>>>> -- per-VPE State -- >>>>>>>> VPE 0 >>>>>>>> VPEControl : 00008000 >>>>>>>> VPEConf0 : 800f0003 >>>>>>>> VPE0.Status : 11004201 >>>>>>>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 >>>>>>>> VPE0.Cause : 50804000 >>>>>>>> VPE0.Config7 : 00010000 >>>>>>>> VPE 1 >>>>>>>> VPEControl : 00068006 >>>>>>>> VPEConf0 : 80cf0003 >>>>>>>> VPE1.Status : 11008301 >>>>>>>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 >>>>>>>> VPE1.Cause : 50800000 >>>>>>>> VPE1.Config7 : 00010000 >>>>>>>> -- per-TC State -- >>>>>>>> TC 0 (current TC with VPE EPC above) >>>>>>>> TCStatus : 18102000 >>>>>>>> TCBind : 00000000 >>>>>>>> TCRestart : 803fa19c printk+0xc/0x30 >>>>>>>> TCHalt : 00000000 >>>>>>>> TCContext : 00000000 >>>>>>>> TC 1 >>>>>>>> TCStatus : 18902000 >>>>>>>> TCBind : 00200000 >>>>>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>>>>>>> TCHalt : 00000000 >>>>>>>> TCContext : 00140000 >>>>>>>> TC 2 >>>>>>>> TCStatus : 18902000 >>>>>>>> TCBind : 00400000 >>>>>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>>>>>>> TCHalt : 00000000 >>>>>>>> TCContext : 00280000 >>>>>>>> TC 3 >>>>>>>> TCStatus : 18902000 >>>>>>>> TCBind : 00600000 >>>>>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 >>>>>>>> TCHalt : 00000000 >>>>>>>> TCContext : 003c0000 >>>>>>>> TC 4 >>>>>>>> TCStatus : 18902000 >>>>>>>> TCBind : 00800001 >>>>>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>>>>>>> TCHalt : 00000000 >>>>>>>> TCContext : 00500000 >>>>>>>> TC 5 >>>>>>>> TCStatus : 18902000 >>>>>>>> TCBind : 00a00001 >>>>>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>>>>>>> TCHalt : 00000000 >>>>>>>> TCContext : 00640000 >>>>>>>> TC 6 >>>>>>>> TCStatus : 18902000 >>>>>>>> TCBind : 00c00001 >>>>>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 >>>>>>>> TCHalt : 00000000 >>>>>>>> TCContext : 00780000 >>>>>>>> Counter Interrupts taken per CPU (TC) >>>>>>>> 0: 0 >>>>>>>> 1: 0 >>>>>>>> 2: 0 >>>>>>>> 3: 0 >>>>>>>> 4: 0 >>>>>>>> 5: 0 >>>>>>>> 6: 0 >>>>>>>> 7: 0 >>>>>>>> Self-IPI invocations: >>>>>>>> 0: 12 >>>>>>>> 1: 0 >>>>>>>> 2: 0 >>>>>>>> 3: 0 >>>>>>>> 4: 0 >>>>>>>> 5: 5 >>>>>>>> 6: 4 >>>>>>>> 7: 0 >>>>>>>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 >>>>>>>> 0 Recoveries of "stolen" FPU >>>>>>>> =========================== >>>>>>>> >>>>>>>> ################################################################ >>>>>>>> >>>>>>>> Thanks >>>>>>>> Anoop >>>>>>>> >>>>>>>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: >>>>>>>>> I took a quick look last night, and the only thing that looked vaguely >>>>>>>>> dangerous in changes since the timer changes I alluded to earlier was >>>>>>>>> the global naming cleanup of irq-related function names that David >>>>>>>>> Howell submitted. The diff didn't look dangerous in itself, but some of >>>>>>>>> the definitions are nested subtly for SMTC to maximize the amount of >>>>>>>>> common code, and I could imagine something getting lost in translation >>>>>>>>> there. If that were really the problem, it would of course affect much >>>>>>>>> more than just the timer subsystem, but early in the boot process, >>>>>>>>> timers are pretty much the only interrupts that have to be handled >>>>>>>>> correctly. >>>>>>>>> >>>>>>>>> I'm travelling today, but will take a look at timekeeping_notify() >>>>>>>>> tomorrow or the next day... >>>>>>>>> >>>>>>>>> /K. >>>>>>>>> >>>>>>>>> On 12/28/10 12:19 AM, Anoop P A wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I had a glance into the code diff without notice of any suspect-able >>>>>>>>>> code . >>>>>>>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify >>>>>>>>>> function. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Anoop >>>>>>>>>> >>>>>>>>>> PS: I may not be available until Thursday >>>>>>>>>> >>>>>>>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: >>>>>>>>>>> Hi Kevin, >>>>>>>>>>> >>>>>>>>>>> It is very unlikely that the patch you pointed has any impact on the the >>>>>>>>>>> hang I am seeing. The patch you have mentioned got into kernel around >>>>>>>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + >>>>>>>>>>> stackframe patch) . >>>>>>>>>>> >>>>>>>>>>> Hi Stuart, >>>>>>>>>>> >>>>>>>>>>> I haven't got much time to spend on this today. >>>>>>>>>>> >>>>>>>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have >>>>>>>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) >>>>>>>>>>> >>>>>>>>>>> So probably some patches in 2.6.37 branch introduced this hang. >>>>>>>>>>> >>>>>>>>>>> Hopefully I will get some free slot tomorrow so that I can look into >>>>>>>>>>> code diff . >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Anoop >>>>>>>>>>> >>>>>>>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: >>>>>>>>>>>> Kevin, >>>>>>>>>>>> >>>>>>>>>>>> Outstanding, sometimes it's better to be lucky than good. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Anoop, >>>>>>>>>>>> >>>>>>>>>>>> Maybe we can get lucky again. >>>>>>>>>>>> >>>>>>>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, >>>>>>>>>>>> I'll be happy to do another diff. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hope you'll have had a good Christmas as well. >>>>>>>>>>>> We've had snow in Alabama since Christmas eve! >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> Stuart >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] >>>>>>>>>>>> Sent: Friday, December 24, 2010 5:34 PM >>>>>>>>>>>> To: Anoop P A >>>>>>>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org >>>>>>>>>>>> Subject: Re: SMTC support status in latest git head. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's >>>>>>>>>>>> performance tweak for the deeper pipelined processors. In looking for >>>>>>>>>>>> this, I did notice that someone did some modification to the SMTC clock >>>>>>>>>>>> tick logic that I was skeptical had ever been tested. If you've still >>>>>>>>>>>> got that kernel binary handy, you might check to see if it boots with >>>>>>>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. >>>>>>>>>>>> >>>>>>>>>>>> Oh, yes, and Merry Christmas one and all! >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> Kevin K. >>>>>>>>>>>> >>>>>>>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote: >>>>>>>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: >>>>>>>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also >>>>>>>>>>>>>> fix things, while preserving the other fixes and performance enhancements? >>>>>>>>>>>>>> >>>>>>>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration >>>>>>>>>>>>> loop but hangs after switching to mips closource >>>>>>>>>>>>> >>>>>>>>>>>>> TC 6 going on-line as CPU 6 >>>>>>>>>>>>> Brought up 7 CPUs >>>>>>>>>>>>> bio: create slab<bio-0> at 0 >>>>>>>>>>>>> SCSI subsystem initialized >>>>>>>>>>>>> Switching to clocksource MIPS >>>>>>>>>>>>> >>>>>>>>>>>>> I Presume this is a different issue as restoring older file didn't help >>>>>>>>>>>>> much to get rid of this hang. >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h >>>>>>>>>>>>> b/arch/mips/include/asm/stackframe.h >>>>>>>>>>>>> index 58730c5..7fc9f10 100644 >>>>>>>>>>>>> --- a/arch/mips/include/asm/stackframe.h >>>>>>>>>>>>> +++ b/arch/mips/include/asm/stackframe.h >>>>>>>>>>>>> @@ -195,9 +195,9 @@ >>>>>>>>>>>>> * to cover the pipeline delay. >>>>>>>>>>>>> */ >>>>>>>>>>>>> .set mips32 >>>>>>>>>>>>> - mfc0 v1, CP0_TCSTATUS >>>>>>>>>>>>> + mfc0 v0, CP0_TCSTATUS >>>>>>>>>>>>> .set mips0 >>>>>>>>>>>>> - LONG_S v1, PT_TCSTATUS(sp) >>>>>>>>>>>>> + LONG_S v0, PT_TCSTATUS(sp) >>>>>>>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ >>>>>>>>>>>>> LONG_S $4, PT_R4(sp) >>>>>>>>>>>>> LONG_S $5, PT_R5(sp) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> /K. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: >>>>>>>>>>>>>>> Hi Kevin, Stuart , >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Woohooo You guys spotted !. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be >>>>>>>>>>>>>>> the culprit >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started >>>>>>>>>>>>>>> booting !. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: >>>>>>>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between >>>>>>>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved >>>>>>>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, >>>>>>>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a >>>>>>>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of >>>>>>>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the >>>>>>>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets >>>>>>>>>>>>>>>> clobbered before it gets stored. This will eventually result in the >>>>>>>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common, >>>>>>>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS >>>>>>>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance >>>>>>>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 >>>>>>>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd >>>>>>>>>>>>>>>> lean toward the second option, but I'm not in a position to test and >>>>>>>>>>>>>>>> submit a patch just now. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Kevin K. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: >>>>>>>>>>>>>>>>> Kevin, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm not sure if it's useful, >>>>>>>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. >>>>>>>>>>>>>>>>> works 2.6.32-stable with patch 804 >>>>>>>>>>>>>>>>> works_not 2.6.33-stable >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC >>>>>>>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> arch/mips/include/asm/irq.h >>>>>>>>>>>>>>>>> arch/mips/kernel/irq.c >>>>>>>>>>>>>>>>> do_IRQ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> arch/mips/include/asm/stackframe.h >>>>>>>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> arch/mips/include/asm/time.h >>>>>>>>>>>>>>>>> clocksource_set_clock >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> arch/mips/kernel/process.c >>>>>>>>>>>>>>>>> cpu_idle >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> arch/mips/kernel/smtc.c >>>>>>>>>>>>>>>>> __irq_entry >>>>>>>>>>>>>>>>> ipi_decode >>>>>>>>>>>>>>>>> SMTC_CLOCK_TICK >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'll try to look in more detail after Christmas. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Stuart >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 17:21 ` Kevin D. Kissell @ 2011-01-04 17:54 ` Anoop P A 2011-01-04 18:33 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2011-01-04 17:54 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Tue, 2011-01-04 at 09:21 -0800, Kevin D. Kissell wrote: > I'm trying to figure out a reason why your change below should help, and > offhand, modulo tool bugs, I don't see it. I'm assuming that your diff > below is a diff relative to the pre-patch stackframe.h. I wouldn't Yes patch created against stock code . > bless it as an alternative because it moves code and comments > unnecessarily - all you should really have to do is to move the > > > 190 mfc0 v1, CP0_STATUS > 191 LONG_S $2, PT_R2(sp) > > to be just after the #endif /* CONFIG_MIPS_MT_SMTC */ at around line 201. Actually I just moved code under CONFIG_MIPS_MT_SMTC to previous block of code ( which store $0 ) . git diff did the rest on behalf of me :) > > If moving the save of zero to PT_R0(sp) actually makes a difference, > it's evidence that you've got problems in your toolchain (or, heaven > forbid, your pipeline)! In previous version of patch usage of V0 was creating issue. I have verified this with previous version of code ( working code before David's instruction rearrangement patch.) . > > But I'd really like to see what your assembler is doing to the original > patch for it to be broken. Assembler instruction reordering is armed, > but it ought not to move register moves and stores around in ways where > your sequence > > 197 .set mips32 > 198 mfc0 v1, CP0_TCSTATUS > 199 .set mips0 > 200 LONG_S v1, PT_TCSTATUS(sp) > 189 LONG_S $0, PT_R0(sp) > 190 mfc0 v1, CP0_STATUS > 191 LONG_S $2, PT_R2(sp) > 202 LONG_S $4, PT_R4(sp) > 203 LONG_S $5, PT_R5(sp) > 204 LONG_S v1, PT_STATUS(sp) > > to work while > > 189 LONG_S $0, PT_R0(sp) > 190 mfc0 v1, CP0_STATUS > 191 LONG_S $2, PT_R2(sp) > 197 .set mips32 > 198 mfc0 v0, CP0_TCSTATUS > 199 .set mips0 > 200 LONG_S v0, PT_TCSTATUS(sp) > 202 LONG_S $4, PT_R4(sp) > 203 LONG_S $5, PT_R5(sp) > 204 LONG_S v1, PT_STATUS(sp) > > does not, provided that the identity of v0=$2, v1=$3 is respected. > > One thing that does stick out as being different - though, again, I'd > need to see the disassembly of an instance of the macro to know what it > could have done - is that the SMTC conditiona code brackets the mfc0 of > TCStatus with .set mips32/.set mips0. Given that the code no longer has > a .set mips0 early in the macro, it would be more correct to make it: > > .set push > .set mips32 > mfc0 v0, CP0_TCSTATUS (or v1, if we move the mfc0 > v1,CP0_STATUS) > .set pop > > and presumably make a similar chage for the block from line 334 to 429. > > But I don't see any causal path from that funniness to failure. > > Regards, > > Kevin K. > > On 01/04/11 06:37, Anoop P A wrote: > > Hi Kevin, > > > > the stackframe patch that you have suggested had some side effects I was > > unable execute init. When I changed some thing like below it started > > working .Could you kindly review it ?. > > > > diff --git a/arch/mips/include/asm/stackframe.h > > b/arch/mips/include/asm/stackframe.h > > index 58730c5..da786ed 100644 > > --- a/arch/mips/include/asm/stackframe.h > > +++ b/arch/mips/include/asm/stackframe.h > > @@ -181,14 +181,6 @@ > > #endif > > LONG_S k0, PT_R29(sp) > > LONG_S $3, PT_R3(sp) > > - /* > > - * You might think that you don't need to save $0, > > - * but the FPU emulator and gdb remote debug stub > > - * need it to operate correctly > > - */ > > - LONG_S $0, PT_R0(sp) > > - mfc0 v1, CP0_STATUS > > - LONG_S $2, PT_R2(sp) > > #ifdef CONFIG_MIPS_MT_SMTC > > /* > > * Ideally, these instructions would be shuffled in > > @@ -199,6 +191,14 @@ > > .set mips0 > > LONG_S v1, PT_TCSTATUS(sp) > > #endif /* CONFIG_MIPS_MT_SMTC */ > > + /* > > + * You might think that you don't need to save $0, > > + * but the FPU emulator and gdb remote debug stub > > + * need it to operate correctly > > + */ > > + LONG_S $0, PT_R0(sp) > > + mfc0 v1, CP0_STATUS > > + LONG_S $2, PT_R2(sp) > > LONG_S $4, PT_R4(sp) > > LONG_S $5, PT_R5(sp) > > LONG_S v1, PT_STATUS(sp) > > > > Linux-2.6.37-rc7 boots all the way if I specify maxvpes=1 in command > > line. > > > > / # cat /proc/interrupts > > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > > CPU6 > > 1: 249 218024 218286 218263 218235 218208 > > 218179 MIPS SMTC_IPI > > 6: 0 0 0 0 0 0 > > 0 MIPS MSP CIC cascade > > 8: 0 0 0 0 0 0 > > 0 MSP_CIC Softreset button > > 9: 0 0 0 0 0 0 > > 0 MSP_CIC Standby switch > > 21: 0 0 0 0 0 0 > > 0 MSP_CIC MSP PER cascade > > 25: 218128 711 11 0 0 0 > > 0 MSP_CIC timer > > 27: 341 22 0 0 2 0 > > 6 MSP_CIC serial > > > > ERR: 0 > > / # uname -a > > Linux (none) 2.6.37-rc7-pmc-00001-g9cff2d6-dirty #289 SMP PREEMPT Tue > > Jan 4 19:48:31 IST 2011 mips GNU/Linux > > > > So clock setup / distribution on VPE1 is some thing need fix. > > > > Thanks > > Anoop > > > > > > On Tue, 2011-01-04 at 18:32 +0530, Anoop P A wrote: > >> On Tue, 2011-01-04 at 00:17 -0800, Kevin D. Kissell wrote: > >>> Those interrupt counters show that IPIs are being taken everywhere, > >>> though very few by CPUs 5 and 6. If I understand the configuration > >>> correctly, CPU 4 is a TC in VPE 1, and it's getting a reasonable IPI > >> Yes CPU4 is in second VPE > >> > >>> rate, *if* we're looking at a tickless kernel under low load. But there > >> No it was not the tickless kernel.I had selected 250 MHz timer. can't we > >> expect IPI / timer interrupt for all the threads in this case ?. > >> > >>> may be a clue there to part of your problem. I have no idea why the > >>> behavior would have changed from 2.6.36 to 2.6.37, but it looks as if > >>> you're getting your clock interrupts through the MSP CIC interrupt > >>> controller on VPE 0. There's nothing symmetric for VPE1. The Malta > >>> example code is perhaps deceptively simple, in that both VPEs have their > >>> count/compare indication wired directly to the 2 clock interrupt inputs, > >>> so that having both of them running with only a single set of irq state > >>> just works. I don't know whether the MSP CIC timer interrupt is a > >> In my case it is separate irq. MSP_INT_VPE1_TIMER (34) and > >> MSP_INT_VPE0_TIMER (25) are wired to CIC . CIC interrupt has been > >> connected to cpu irq 6. > >> > >> I can reproduce cpu stall in VSMP mode If I don't setup VPE1 timer > >> interrupt . Don't we have support for separate irq in SMTC > >> implementation ?.. > >> > >>> gating of the VPE0 count/compare output, or whether it's it's own > >>> interval timer, but I suspect that you may need to do some further > >>> low-level initialization in the platform-specific code to set up an > >>> interrupt on the VPE1 side. I don't think the snippet you've got below > >>> would work as written. > >> The routine which I copied works fine for VSMP mode . > >> > >> / # cat /proc/interrupts > >> CPU0 CPU1 > >> 0: 187 254 MIPS IPI_resched > >> 1: 77 174 MIPS IPI_call > >> 6: 0 0 MIPS MSP CIC cascade > >> 8: 0 0 MSP_CIC Softreset button > >> 9: 0 0 MSP_CIC Standby switch > >> 21: 0 0 MSP_CIC MSP PER cascade > >> 25: 37077 0 MSP_CIC timer > >> 27: 188 0 MSP_CIC serial > >> 34: 0 36986 MSP_CIC timer > >> > >> Do I want to change anything specific for SMTC ? . > >> > >>> If it's purely an issue with clock distribution on VPE1, then a boot > >>> with maxvpes=1 maxtcs=4 should be stable. > >> Yes the kernel seems to be stable if I boot with maxvpes=1 maxtcs=4 . > >> > >>> /K. > >>> > >>> On 1/3/2011 11:20 AM, Anoop P A wrote: > >>>> Hi Kevin, > >>>> > >>>> On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: > >>>>> The very first SMTC implementations didn't support full kernel-mode > >>>>> preemption, which anyway wasn't a priority, given the hardware event > >>>>> response support in MIPS MT. I believe it was later made compatible, > >>>>> but it was never extensively exercised. Since SMTC has fingers in some > >>>>> pretty low-level atomicity mechanisms, if a new, parallel set was > >>>>> implemented for RCU, I can easily imagine that nobody has yet > >>>>> implemented SMTC-ified variants of that set. > >>>>> > >>>>> Your last statement isn't very clear, though. Are you saying that if > >>>>> you configure for no forced preemption and with TREE_CPU, the 2.6.37 > >>>>> kernel boots all the way up, or that it simply hangs later? What's the > >>>>> last rev kernel that actually boots all the way up? > >>>> I have debugged this a bit more. It seems that kernel getting stalled > >>>> while executing on TC's of second VPE . > >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>>> by 1, t=2504 jiffies) > >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>>> by 1, t=10036 jiffies) > >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>>> by 1, t=17568 jiffies) > >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>>> by 1, t=25100 jiffies) > >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>>> by 1, t=32632 jiffies) > >>>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>>> by 1, t=40164 jiffies) > >>>> > >>>> With CONFIG_TREE_CPU we were not hitting this scenario very often. > >>>> However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. > >>>> > >>>> I presume some issue in my timer setup . I am not seeing timer interrupt > >>>> (or IPI interrupt) getting incremented for VPE1 tcs on a completely > >>>> booted 2.6.32-stable kernel. > >>>> > >>>> / # cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >>>> CPU6 > >>>> 1: 148 15023 15140 15093 3779 8 > >>>> 2 MIPS SMTC_IPI > >>>> 6: 0 0 0 0 0 0 > >>>> 0 MIPS MSP CIC cascade > >>>> 8: 0 0 0 0 0 0 > >>>> 0 MSP_CIC Softreset button > >>>> 9: 0 0 0 0 0 0 > >>>> 0 MSP_CIC Standby switch > >>>> 21: 0 0 0 0 0 0 > >>>> 0 MSP_CIC MSP PER cascade > >>>> 25: 15113 341 4 7 0 0 > >>>> 0 MSP_CIC timer > >>>> 27: 260 9 0 1 0 0 > >>>> 0 MSP_CIC serial > >>>> 34: 0 0 0 0 0 0 > >>>> 0 MSP_CIC timer > >>>> > >>>> Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. > >>>> > >>>> I have tried setting up VPE1 timer from get_co_compare_int as follows > >>>> > >>>> unsigned int __cpuinit get_c0_compare_int(void) > >>>> { > >>>> if ((1==get_current_vpe())&& !vpe1_timr_installed){ > >>>> > >>>> memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); > >>>> > >>>> setup_irq(MSP_INT_VPE1_TIMER,&timer_vpe1); > >>>> vpe1_timr_installed++; > >>>> } > >>>> return (get_current_vpe() ? MSP_INT_VPE1_TIMER : > >>>> MSP_INT_VPE0_TIMER); > >>>> } > >>>> > >>>> Thanks > >>>> Anoop > >>>> > >>>>> Regards, > >>>>> > >>>>> Kevin K. > >>>>> > >>>>> On 1/3/2011 7:12 AM, Anoop P A wrote: > >>>>>> Hi , > >>>>>> > >>>>>> Following patch restricts TREE_CPU RCU implementation only for !PREEMPT > >>>>>> SMP kernel. > >>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 > >>>>>> > >>>>>> CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel > >>>>>> ( which will be only available RCU implementation for SMTC kernel from > >>>>>> 2.6.37 onwards) . > >>>>>> > >>>>>> With no forced preemption and selecting TREE_CPU I am able to boot > >>>>>> further to the hang that I have reported. > >>>>>> > >>>>>> Thanks > >>>>>> Anoop > >>>>>> > >>>>>> On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: > >>>>>>> At this point the logical thing to do would seem to look at your kernel > >>>>>>> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 > >>>>>>> shows the last exception to have been taken. That's a critical SMTC > >>>>>>> routine that gets called whenever an xxx_irq_restore() enables > >>>>>>> interrupts, so that virtual per-TC IPI interrupts that were posted while > >>>>>>> the TC had interrupts disabled can be handled deterministically. As I > >>>>>>> mentioned in an earlier message, there was some cleanup work from David > >>>>>>> Howell that changed a number of irq management-related function names > >>>>>>> and prototypes across all architectures, which went into linux-mips.org > >>>>>>> at very roughly the time of the breakage. The SMTC overlay over the irq > >>>>>>> implementation has been pretty robust, but it's written in a perhaps > >>>>>>> doomed attempt to be both efficient and using a maximum amount of common > >>>>>>> code with the general case. A mechanical or semi-mechanical change > >>>>>>> could conceivably have broken things. > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Kevin K. > >>>>>>> > >>>>>>> > >>>>>>> On 12/31/2010 4:27 AM, Anoop P A wrote: > >>>>>>>> Hi , > >>>>>>>> > >>>>>>>> Kernel hangs on stop_machine call. Please find mt reg dump below. > >>>>>>>> Another important observation is even though 2.6.33 kernel + stackframe > >>>>>>>> patch well passes calibration hang , I am still unable boot in to a > >>>>>>>> initramfs root ( verified ramfs working with VSMP). So it looks like > >>>>>>>> still some issue to fix between 2.6.32 and 2.6.33 . > >>>>>>>> ######################## Log ########################### > >>>>>>>> > >>>>>>>> === MIPS MT State Dump === > >>>>>>>> -- Global State -- > >>>>>>>> MVPControl Passed: 00000005 > >>>>>>>> MVPControl Read: 00000004 > >>>>>>>> MVPConf0 : a8008406 > >>>>>>>> -- per-VPE State -- > >>>>>>>> VPE 0 > >>>>>>>> VPEControl : 00008000 > >>>>>>>> VPEConf0 : 800f0003 > >>>>>>>> VPE0.Status : 11004201 > >>>>>>>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 > >>>>>>>> VPE0.Cause : 50804000 > >>>>>>>> VPE0.Config7 : 00010000 > >>>>>>>> VPE 1 > >>>>>>>> VPEControl : 00068006 > >>>>>>>> VPEConf0 : 80cf0003 > >>>>>>>> VPE1.Status : 11008301 > >>>>>>>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 > >>>>>>>> VPE1.Cause : 50800000 > >>>>>>>> VPE1.Config7 : 00010000 > >>>>>>>> -- per-TC State -- > >>>>>>>> TC 0 (current TC with VPE EPC above) > >>>>>>>> TCStatus : 18102000 > >>>>>>>> TCBind : 00000000 > >>>>>>>> TCRestart : 803fa19c printk+0xc/0x30 > >>>>>>>> TCHalt : 00000000 > >>>>>>>> TCContext : 00000000 > >>>>>>>> TC 1 > >>>>>>>> TCStatus : 18902000 > >>>>>>>> TCBind : 00200000 > >>>>>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>>>>>>> TCHalt : 00000000 > >>>>>>>> TCContext : 00140000 > >>>>>>>> TC 2 > >>>>>>>> TCStatus : 18902000 > >>>>>>>> TCBind : 00400000 > >>>>>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>>>>>>> TCHalt : 00000000 > >>>>>>>> TCContext : 00280000 > >>>>>>>> TC 3 > >>>>>>>> TCStatus : 18902000 > >>>>>>>> TCBind : 00600000 > >>>>>>>> TCRestart : 801022a0 r4k_wait+0x20/0x40 > >>>>>>>> TCHalt : 00000000 > >>>>>>>> TCContext : 003c0000 > >>>>>>>> TC 4 > >>>>>>>> TCStatus : 18902000 > >>>>>>>> TCBind : 00800001 > >>>>>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>>>>>>> TCHalt : 00000000 > >>>>>>>> TCContext : 00500000 > >>>>>>>> TC 5 > >>>>>>>> TCStatus : 18902000 > >>>>>>>> TCBind : 00a00001 > >>>>>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>>>>>>> TCHalt : 00000000 > >>>>>>>> TCContext : 00640000 > >>>>>>>> TC 6 > >>>>>>>> TCStatus : 18902000 > >>>>>>>> TCBind : 00c00001 > >>>>>>>> TCRestart : 8010229c r4k_wait+0x1c/0x40 > >>>>>>>> TCHalt : 00000000 > >>>>>>>> TCContext : 00780000 > >>>>>>>> Counter Interrupts taken per CPU (TC) > >>>>>>>> 0: 0 > >>>>>>>> 1: 0 > >>>>>>>> 2: 0 > >>>>>>>> 3: 0 > >>>>>>>> 4: 0 > >>>>>>>> 5: 0 > >>>>>>>> 6: 0 > >>>>>>>> 7: 0 > >>>>>>>> Self-IPI invocations: > >>>>>>>> 0: 12 > >>>>>>>> 1: 0 > >>>>>>>> 2: 0 > >>>>>>>> 3: 0 > >>>>>>>> 4: 0 > >>>>>>>> 5: 5 > >>>>>>>> 6: 4 > >>>>>>>> 7: 0 > >>>>>>>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > >>>>>>>> 0 Recoveries of "stolen" FPU > >>>>>>>> =========================== > >>>>>>>> > >>>>>>>> ################################################################ > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> Anoop > >>>>>>>> > >>>>>>>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: > >>>>>>>>> I took a quick look last night, and the only thing that looked vaguely > >>>>>>>>> dangerous in changes since the timer changes I alluded to earlier was > >>>>>>>>> the global naming cleanup of irq-related function names that David > >>>>>>>>> Howell submitted. The diff didn't look dangerous in itself, but some of > >>>>>>>>> the definitions are nested subtly for SMTC to maximize the amount of > >>>>>>>>> common code, and I could imagine something getting lost in translation > >>>>>>>>> there. If that were really the problem, it would of course affect much > >>>>>>>>> more than just the timer subsystem, but early in the boot process, > >>>>>>>>> timers are pretty much the only interrupts that have to be handled > >>>>>>>>> correctly. > >>>>>>>>> > >>>>>>>>> I'm travelling today, but will take a look at timekeeping_notify() > >>>>>>>>> tomorrow or the next day... > >>>>>>>>> > >>>>>>>>> /K. > >>>>>>>>> > >>>>>>>>> On 12/28/10 12:19 AM, Anoop P A wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I had a glance into the code diff without notice of any suspect-able > >>>>>>>>>> code . > >>>>>>>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify > >>>>>>>>>> function. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Anoop > >>>>>>>>>> > >>>>>>>>>> PS: I may not be available until Thursday > >>>>>>>>>> > >>>>>>>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > >>>>>>>>>>> Hi Kevin, > >>>>>>>>>>> > >>>>>>>>>>> It is very unlikely that the patch you pointed has any impact on the the > >>>>>>>>>>> hang I am seeing. The patch you have mentioned got into kernel around > >>>>>>>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > >>>>>>>>>>> stackframe patch) . > >>>>>>>>>>> > >>>>>>>>>>> Hi Stuart, > >>>>>>>>>>> > >>>>>>>>>>> I haven't got much time to spend on this today. > >>>>>>>>>>> > >>>>>>>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > >>>>>>>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > >>>>>>>>>>> > >>>>>>>>>>> So probably some patches in 2.6.37 branch introduced this hang. > >>>>>>>>>>> > >>>>>>>>>>> Hopefully I will get some free slot tomorrow so that I can look into > >>>>>>>>>>> code diff . > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> Anoop > >>>>>>>>>>> > >>>>>>>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > >>>>>>>>>>>> Kevin, > >>>>>>>>>>>> > >>>>>>>>>>>> Outstanding, sometimes it's better to be lucky than good. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Anoop, > >>>>>>>>>>>> > >>>>>>>>>>>> Maybe we can get lucky again. > >>>>>>>>>>>> > >>>>>>>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > >>>>>>>>>>>> I'll be happy to do another diff. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Hope you'll have had a good Christmas as well. > >>>>>>>>>>>> We've had snow in Alabama since Christmas eve! > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> > >>>>>>>>>>>> Stuart > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -----Original Message----- > >>>>>>>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com] > >>>>>>>>>>>> Sent: Friday, December 24, 2010 5:34 PM > >>>>>>>>>>>> To: Anoop P A > >>>>>>>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org > >>>>>>>>>>>> Subject: Re: SMTC support status in latest git head. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's > >>>>>>>>>>>> performance tweak for the deeper pipelined processors. In looking for > >>>>>>>>>>>> this, I did notice that someone did some modification to the SMTC clock > >>>>>>>>>>>> tick logic that I was skeptical had ever been tested. If you've still > >>>>>>>>>>>> got that kernel binary handy, you might check to see if it boots with > >>>>>>>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > >>>>>>>>>>>> > >>>>>>>>>>>> Oh, yes, and Merry Christmas one and all! > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> > >>>>>>>>>>>> Kevin K. > >>>>>>>>>>>> > >>>>>>>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote: > >>>>>>>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >>>>>>>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >>>>>>>>>>>>>> fix things, while preserving the other fixes and performance enhancements? > >>>>>>>>>>>>>> > >>>>>>>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration > >>>>>>>>>>>>> loop but hangs after switching to mips closource > >>>>>>>>>>>>> > >>>>>>>>>>>>> TC 6 going on-line as CPU 6 > >>>>>>>>>>>>> Brought up 7 CPUs > >>>>>>>>>>>>> bio: create slab<bio-0> at 0 > >>>>>>>>>>>>> SCSI subsystem initialized > >>>>>>>>>>>>> Switching to clocksource MIPS > >>>>>>>>>>>>> > >>>>>>>>>>>>> I Presume this is a different issue as restoring older file didn't help > >>>>>>>>>>>>> much to get rid of this hang. > >>>>>>>>>>>>> > >>>>>>>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h > >>>>>>>>>>>>> b/arch/mips/include/asm/stackframe.h > >>>>>>>>>>>>> index 58730c5..7fc9f10 100644 > >>>>>>>>>>>>> --- a/arch/mips/include/asm/stackframe.h > >>>>>>>>>>>>> +++ b/arch/mips/include/asm/stackframe.h > >>>>>>>>>>>>> @@ -195,9 +195,9 @@ > >>>>>>>>>>>>> * to cover the pipeline delay. > >>>>>>>>>>>>> */ > >>>>>>>>>>>>> .set mips32 > >>>>>>>>>>>>> - mfc0 v1, CP0_TCSTATUS > >>>>>>>>>>>>> + mfc0 v0, CP0_TCSTATUS > >>>>>>>>>>>>> .set mips0 > >>>>>>>>>>>>> - LONG_S v1, PT_TCSTATUS(sp) > >>>>>>>>>>>>> + LONG_S v0, PT_TCSTATUS(sp) > >>>>>>>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ > >>>>>>>>>>>>> LONG_S $4, PT_R4(sp) > >>>>>>>>>>>>> LONG_S $5, PT_R5(sp) > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> /K. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: > >>>>>>>>>>>>>>> Hi Kevin, Stuart , > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Woohooo You guys spotted !. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>>>>>>>>>>>>>> the culprit > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>>>>>>>>>>>>>> booting !. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Anoop > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>>>>>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>>>>>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>>>>>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>>>>>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a > >>>>>>>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>>>>>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the > >>>>>>>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>>>>>>>>>>>>>> clobbered before it gets stored. This will eventually result in the > >>>>>>>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common, > >>>>>>>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>>>>>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance > >>>>>>>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>>>>>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>>>>>>>>>>>>>> lean toward the second option, but I'm not in a position to test and > >>>>>>>>>>>>>>>> submit a patch just now. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Kevin K. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>>>>>>>>>>>>>> Kevin, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'm not sure if it's useful, > >>>>>>>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>>>>>>>>>>>>>> works 2.6.32-stable with patch 804 > >>>>>>>>>>>>>>>>> works_not 2.6.33-stable > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>>>>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> arch/mips/include/asm/irq.h > >>>>>>>>>>>>>>>>> arch/mips/kernel/irq.c > >>>>>>>>>>>>>>>>> do_IRQ > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> arch/mips/include/asm/stackframe.h > >>>>>>>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> arch/mips/include/asm/time.h > >>>>>>>>>>>>>>>>> clocksource_set_clock > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> arch/mips/kernel/process.c > >>>>>>>>>>>>>>>>> cpu_idle > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> arch/mips/kernel/smtc.c > >>>>>>>>>>>>>>>>> __irq_entry > >>>>>>>>>>>>>>>>> ipi_decode > >>>>>>>>>>>>>>>>> SMTC_CLOCK_TICK > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'll try to look in more detail after Christmas. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Stuart > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 17:54 ` Anoop P A @ 2011-01-04 18:33 ` Kevin D. Kissell 2011-01-05 13:11 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-04 18:33 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips On 01/04/11 09:54, Anoop P A wrote: > On Tue, 2011-01-04 at 09:21 -0800, Kevin D. Kissell wrote: >> I'm trying to figure out a reason why your change below should help, and >> offhand, modulo tool bugs, I don't see it. I'm assuming that your diff >> below is a diff relative to the pre-patch stackframe.h. I wouldn't > Yes patch created against stock code . > >> bless it as an alternative because it moves code and comments >> unnecessarily - all you should really have to do is to move the >> >> >> 190 mfc0 v1, CP0_STATUS >> 191 LONG_S $2, PT_R2(sp) >> >> to be just after the #endif /* CONFIG_MIPS_MT_SMTC */ at around line 201. > Actually I just moved code under CONFIG_MIPS_MT_SMTC to previous block > of code ( which store $0 ) . git diff did the rest on behalf of me :) > >> If moving the save of zero to PT_R0(sp) actually makes a difference, >> it's evidence that you've got problems in your toolchain (or, heaven >> forbid, your pipeline)! > In previous version of patch usage of V0 was creating issue. I have > verified this with previous version of code ( working code before > David's instruction rearrangement patch.) . Argh. It's not very clearly commented, but it looks as if the system call trap handler has an implicit assumption that v0 has never been changed by SAVE_SOME, TRACE_IRQS_ON_RELOAD, or STI. So yeah, moving the code around to fix the v1 conflict ends up being better than using v0 - otherwise, we'd need to add a LONG_L v0, PT_R2(sp) somewhere after the LONG_S v0, PT_TCSTATUS(sp) of the original patch. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 18:33 ` Kevin D. Kissell @ 2011-01-05 13:11 ` Anoop P A 2011-01-05 19:23 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2011-01-05 13:11 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Tue, 2011-01-04 at 10:33 -0800, Kevin D. Kissell wrote: > On 01/04/11 09:54, Anoop P A wrote: > > On Tue, 2011-01-04 at 09:21 -0800, Kevin D. Kissell wrote: > >> I'm trying to figure out a reason why your change below should help, and > >> offhand, modulo tool bugs, I don't see it. I'm assuming that your diff > >> below is a diff relative to the pre-patch stackframe.h. I wouldn't > > Yes patch created against stock code . > > > >> bless it as an alternative because it moves code and comments > >> unnecessarily - all you should really have to do is to move the > >> > >> > >> 190 mfc0 v1, CP0_STATUS > >> 191 LONG_S $2, PT_R2(sp) > >> > >> to be just after the #endif /* CONFIG_MIPS_MT_SMTC */ at around line 201. > > Actually I just moved code under CONFIG_MIPS_MT_SMTC to previous block > > of code ( which store $0 ) . git diff did the rest on behalf of me :) > > > >> If moving the save of zero to PT_R0(sp) actually makes a difference, > >> it's evidence that you've got problems in your toolchain (or, heaven > >> forbid, your pipeline)! > > In previous version of patch usage of V0 was creating issue. I have > > verified this with previous version of code ( working code before > > David's instruction rearrangement patch.) . > > Argh. It's not very clearly commented, but it looks as if the system > call trap handler has an implicit assumption that v0 has never been > changed by SAVE_SOME, TRACE_IRQS_ON_RELOAD, or STI. So yeah, moving the > code around to fix the v1 conflict ends up being better than using v0 - > otherwise, we'd need to add a LONG_L v0, PT_R2(sp) somewhere after the > LONG_S v0, PT_TCSTATUS(sp) of the original patch. Well, Here is the patch. diff --git a/arch/mips/include/asm/stackframe.h b/arch/mips/include/asm/stackframe.h index 58730c5..19418c4 100644 --- a/arch/mips/include/asm/stackframe.h +++ b/arch/mips/include/asm/stackframe.h @@ -187,8 +187,6 @@ * need it to operate correctly */ LONG_S $0, PT_R0(sp) - mfc0 v1, CP0_STATUS - LONG_S $2, PT_R2(sp) #ifdef CONFIG_MIPS_MT_SMTC /* * Ideally, these instructions would be shuffled in @@ -199,6 +197,8 @@ .set mips0 LONG_S v1, PT_TCSTATUS(sp) #endif /* CONFIG_MIPS_MT_SMTC */ + mfc0 v1, CP0_STATUS + LONG_S $2, PT_R2(sp) LONG_S $4, PT_R4(sp) LONG_S $5, PT_R5(sp) LONG_S v1, PT_STATUS(sp) > > Regards, > > Kevin K. ^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-05 13:11 ` Anoop P A @ 2011-01-05 19:23 ` Kevin D. Kissell 2011-01-06 20:23 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-05 19:23 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips On 01/05/11 05:11, Anoop P A wrote: > On Tue, 2011-01-04 at 10:33 -0800, Kevin D. Kissell wrote: >> On 01/04/11 09:54, Anoop P A wrote: >>> On Tue, 2011-01-04 at 09:21 -0800, Kevin D. Kissell wrote: >>>> I'm trying to figure out a reason why your change below should help, and >>>> offhand, modulo tool bugs, I don't see it. I'm assuming that your diff >>>> below is a diff relative to the pre-patch stackframe.h. I wouldn't >>> Yes patch created against stock code . >>> >>>> bless it as an alternative because it moves code and comments >>>> unnecessarily - all you should really have to do is to move the >>>> >>>> >>>> 190 mfc0 v1, CP0_STATUS >>>> 191 LONG_S $2, PT_R2(sp) >>>> >>>> to be just after the #endif /* CONFIG_MIPS_MT_SMTC */ at around line 201. >>> Actually I just moved code under CONFIG_MIPS_MT_SMTC to previous block >>> of code ( which store $0 ) . git diff did the rest on behalf of me :) >>> >>>> If moving the save of zero to PT_R0(sp) actually makes a difference, >>>> it's evidence that you've got problems in your toolchain (or, heaven >>>> forbid, your pipeline)! >>> In previous version of patch usage of V0 was creating issue. I have >>> verified this with previous version of code ( working code before >>> David's instruction rearrangement patch.) . >> Argh. It's not very clearly commented, but it looks as if the system >> call trap handler has an implicit assumption that v0 has never been >> changed by SAVE_SOME, TRACE_IRQS_ON_RELOAD, or STI. So yeah, moving the >> code around to fix the v1 conflict ends up being better than using v0 - >> otherwise, we'd need to add a LONG_L v0, PT_R2(sp) somewhere after the >> LONG_S v0, PT_TCSTATUS(sp) of the original patch. > Well, Here is the patch. > > diff --git a/arch/mips/include/asm/stackframe.h > b/arch/mips/include/asm/stackframe.h > index 58730c5..19418c4 100644 > --- a/arch/mips/include/asm/stackframe.h > +++ b/arch/mips/include/asm/stackframe.h > @@ -187,8 +187,6 @@ > * need it to operate correctly > */ > LONG_S $0, PT_R0(sp) > - mfc0 v1, CP0_STATUS > - LONG_S $2, PT_R2(sp) > #ifdef CONFIG_MIPS_MT_SMTC > /* > * Ideally, these instructions would be shuffled in > @@ -199,6 +197,8 @@ > .set mips0 > LONG_S v1, PT_TCSTATUS(sp) > #endif /* CONFIG_MIPS_MT_SMTC */ > + mfc0 v1, CP0_STATUS > + LONG_S $2, PT_R2(sp) > LONG_S $4, PT_R4(sp) > LONG_S $5, PT_R5(sp) > LONG_S v1, PT_STATUS(sp) That's exactly what I'd propose as the cleanest minimal fix. I've got a version that also replaces the .set mips32 / .set mips0 with the .set push / .set pop paradigm, which I'd have used in the original code if I'd known at the time about that assembler directive. I'm hoping to be able to test on a Malta/34K reference platform, and make sure there isn't breakage on that platform branch as well, before we commit to the repository. Your msp_smtc.c file looks plausible on the face of it. The init_secondary function has the quirk that it expects to execute on each "CPU" in numerical order, which is very likely but not guaranteed. It *ought* to be harmless in the rare case where it fails, but the assumption is worth a comment, IMHO. At this point, there shouldn't be a whole lot of SMTC-specific mystery to get your timer running on the second VPE. You know it's taking interrupts, because of the IPIs getting through, so in principle you just need to run the chain of enables from the clock peripheral itself through the CIC to the CPU core and the IM bits. It would be really cool if we could get a stable repository branch that boots SMTC out-of-the-box on both Malta and the MSP platform. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-05 19:23 ` Kevin D. Kissell @ 2011-01-06 20:23 ` Anoop P A 2011-01-06 23:31 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2011-01-06 20:23 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Wed, 2011-01-05 at 11:23 -0800, Kevin D. Kissell wrote: > > LONG_S $5, PT_R5(sp) > > LONG_S v1, PT_STATUS(sp) > > That's exactly what I'd propose as the cleanest minimal fix. I've got a > version that also replaces the .set mips32 / .set mips0 with the .set > push / .set pop paradigm, which I'd have used in the original code if > I'd known at the time about that assembler directive. I'm hoping to be > able to test on a Malta/34K reference platform, and make sure there > isn't breakage on that platform branch as well, before we commit to the > repository. I hope somebody can test this patch on Malta/34K platform. I don't have access to any malta boards and I believe 34K MT simulations is not available on qemu. > > Your msp_smtc.c file looks plausible on the face of it. The > init_secondary function has the quirk that it expects to execute on each > "CPU" in numerical order, which is very likely but not guaranteed. It > *ought* to be harmless in the rare case where it fails, but the > assumption is worth a comment, IMHO. Yes I will add a comment. > > At this point, there shouldn't be a whole lot of SMTC-specific mystery > to get your timer running on the second VPE. You know it's taking > interrupts, because of the IPIs getting through, so in principle you > just need to run the chain of enables from the clock peripheral itself > through the CIC to the CPU core and the IM bits. I hope we are almost there. I have made some progress with the debug . I think you should be able to give better insight to the observation I have made. 1. Without selecting CONFIG_MIPS_MT_SMTC_IM_BACKSTOP My kernel hangs in calibration loop itself . ( I haven't looked further into this). 2. With CONFIG_MIPS_MT_SMTC_IM_BACKSTOP I found I am getting 3 VPE1-TIMER interrupt ( one for each TC of VPE1) .However this interrupts are not getting carried till c0_compare_interrupt . do_IRQ call had a SMTC hook which is modifying tccontext ( To reduce complexity I haven't selected SMTC affinity). Once I disabled this call . I am seeing VPE1 timer interrupts and able to boot completely without any issue's so far :). / # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 1: 171 727459 727561 727533 27 727446 727453 MIPS SMTC_IPI 6: 0 0 0 0 0 0 0 MIPS MSP CIC cascade 8: 0 0 0 0 0 0 0 MSP_CIC Softreset button 9: 0 0 0 0 0 0 0 MSP_CIC Standby switch 21: 0 0 0 0 0 0 0 MSP_CIC MSP PER cascade 25: 727507 484 11 0 0 0 0 MSP_CIC timer 27: 0 0 0 0 258 10 1 MSP_CIC serial 34: 0 0 0 0 727533 7 1 MSP_CIC timer BTW following code in my cic init was setting hwmask. /* initialize all the IRQ descriptors */ for (i = MSP_CIC_INTBASE ; i < MSP_CIC_INTBASE + 32 ; i++) { set_irq_chip_and_handler(i, &msp_cic_irq_controller, handle_level_irq); #ifdef CONFIG_MIPS_MT_SMTC irq_hwmask[i] = C_IRQ4; #endif } > It would be really cool if we could get a stable repository branch that > boots SMTC out-of-the-box on both Malta and the MSP platform. :) > > Regards, > > Kevin K. > > Thanks Anoop ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-06 20:23 ` Anoop P A @ 2011-01-06 23:31 ` Kevin D. Kissell 2011-01-07 7:56 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-06 23:31 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips On 01/06/11 12:23, Anoop P A wrote: > On Wed, 2011-01-05 at 11:23 -0800, Kevin D. Kissell wrote: >> At this point, there shouldn't be a whole lot of SMTC-specific mystery >> to get your timer running on the second VPE. You know it's taking >> interrupts, because of the IPIs getting through, so in principle you >> just need to run the chain of enables from the clock peripheral itself >> through the CIC to the CPU core and the IM bits. > I hope we are almost there. I have made some progress with the debug . I > think you should be able to give better insight to the observation I > have made. > > 1. Without selecting CONFIG_MIPS_MT_SMTC_IM_BACKSTOP My kernel hangs in > calibration loop itself . ( I haven't looked further into this). That suggests a problem with Status.IM initialization and/or the handling of irq_hwmask[]. Do you mean that this is always true, or only if VPE1 is being booted? You haven't mentioned it before. > 2. With CONFIG_MIPS_MT_SMTC_IM_BACKSTOP I found I am getting 3 > VPE1-TIMER interrupt ( one for each TC of VPE1) .However this interrupts > are not getting carried till c0_compare_interrupt . Would you expect them to? I thought you were using an outboard timer and not the CP0 Compare interrupt. > do_IRQ call had a SMTC hook which is modifying tccontext ( To reduce > complexity I haven't selected SMTC affinity). > > Once I disabled this call . I am seeing VPE1 timer interrupts and able > to boot completely without any issue's so far :). So long as you've got the IM_BACKSTOP hack enabled, right? Because otherwise, without that __DO_IRQ_SMTC_HOOK() invocation > / # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > CPU6 > 1: 171 727459 727561 727533 27 727446 > 727453 MIPS SMTC_IPI > 6: 0 0 0 0 0 0 > 0 MIPS MSP CIC cascade > 8: 0 0 0 0 0 0 > 0 MSP_CIC Softreset button > 9: 0 0 0 0 0 0 > 0 MSP_CIC Standby switch > 21: 0 0 0 0 0 0 > 0 MSP_CIC MSP PER cascade > 25: 727507 484 11 0 0 0 > 0 MSP_CIC timer > 27: 0 0 0 0 258 10 > 1 MSP_CIC serial > 34: 0 0 0 0 727533 7 > 1 MSP_CIC timer > > > BTW following code in my cic init was setting hwmask. > > /* initialize all the IRQ descriptors */ > for (i = MSP_CIC_INTBASE ; i< MSP_CIC_INTBASE + 32 ; i++) { > set_irq_chip_and_handler(i,&msp_cic_irq_controller, > handle_level_irq); > #ifdef CONFIG_MIPS_MT_SMTC > irq_hwmask[i] = C_IRQ4; > #endif > } I'm sure I've said this before, and it's in various comments in the SMTC code, but remember, one of the main problems that the SMTC kernel had to solve was to prevent all TCs of a VPE from "convoying" after every interrupt. The way this is done is that the interrupt vector code, before clearing EXL, masks off the Status.IM bit associated with the incoming interrupt. Of course, to get another interrupt from the same source (or collection of sources), that IM bit needs to be restored. The "correct" mechanism for this is by having the appropriate irq_hwmask[] value set, so that smtc_im_ack_irq(), which should be invoked on an irq "ack()" (meaning that the source has been quenched and any new occurrence should be considered a new interrupt), will restore the bit in Status. This function got moved around a bit in the various SMTC prototypes, but it proved least intrusive to put it into the xxx_mask_and_ack() functions for the interrupt controllers - see irq-msc01.c and i8259.c. If you haven't done the same in any equivalent code for a different on-chip controller, you'll definitely have problems. The Backstop scheme works OK for peripheral interrupts that didn't have an appropriate irq_hwmask[] value set up, but clock interrupts don't follow the same code paths and can't depend on the backstop. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-06 23:31 ` Kevin D. Kissell @ 2011-01-07 7:56 ` Anoop P A 2011-01-07 18:46 ` Kevin D. Kissell 2011-01-10 19:30 ` Kevin D. Kissell 0 siblings, 2 replies; 68+ messages in thread From: Anoop P A @ 2011-01-07 7:56 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips [-- Attachment #1: Type: text/plain, Size: 2687 bytes --] On Thu, 2011-01-06 at 15:31 -0800, Kevin D. Kissell wrote: > On 01/06/11 12:23, Anoop P A wrote: > > I'm sure I've said this before, and it's in various comments in the SMTC > code, but remember, one of the main problems that the SMTC kernel > had to solve was to prevent all TCs of a VPE from "convoying" after every > interrupt. The way this is done is that the interrupt vector code, before > clearing EXL, masks off the Status.IM bit associated with the incoming > interrupt. Of course, to get another interrupt from the same source > (or collection of sources), that IM bit needs to be restored. The "correct" > mechanism for this is by having the appropriate irq_hwmask[] value set, > so that smtc_im_ack_irq(), which should be invoked on an irq "ack()" > (meaning that the source has been quenched and any new occurrence > should be considered a new interrupt), will restore the bit in Status. > This function got moved around a bit in the various SMTC prototypes, > but it proved least intrusive to put it into the xxx_mask_and_ack() > functions > for the interrupt controllers - see irq-msc01.c and i8259.c. If you haven't > done the same in any equivalent code for a different on-chip controller, > you'll definitely have problems. > > The Backstop scheme works OK for peripheral interrupts that didn't > have an appropriate irq_hwmask[] value set up, but clock interrupts > don't follow the same code paths and can't depend on the backstop. Ok. Well thanks much for your detailed explanation. Well I hope I found the root cause . smtc_clockevent_init() was overriding irq_hwmask even if are using platform specific get_c0_compare_int. With following patch everything seems to be working for me. ------------------------------------------------------------------------ diff --git a/arch/mips/kernel/cevt-smtc.c b/arch/mips/kernel/cevt-smtc.c index 2e72d30..a25fc59 100644 --- a/arch/mips/kernel/cevt-smtc.c +++ b/arch/mips/kernel/cevt-smtc.c @@ -310,9 +310,14 @@ int __cpuinit smtc_clockevent_init(void) return 0; /* * And we need the hwmask associated with the c0_compare - * vector to be initialized. + * vector to be initialized. However incase of platform + * specific get_co_compare_int, don't override irq_hwmask + * expect platform code to set a valid mask value. */ - irq_hwmask[irq] = (0x100 << cp0_compare_irq); + + if (!get_c0_compare_int) + irq_hwmask[irq] = (0x100 << cp0_compare_irq); + if (cp0_timer_irq_installed) return 0; ----------------------------------------------------------------------- Attaching my msp_ir_cic.c . Kindly have a look if possible. Thanks Anoop > > Regards, > > Kevin K. [-- Attachment #2: msp_irq_cic.c --] [-- Type: text/x-csrc, Size: 5357 bytes --] /* * Copyright 2010 PMC-Sierra, Inc, derived from irq_cpu.c * * This file define the irq handler for MSP CIC subsystem interrupts. * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the * Free Software Foundation; either version 2 of the License, or (at your * option) any later version. */ #include <linux/init.h> #include <linux/interrupt.h> #include <linux/kernel.h> #include <linux/bitops.h> #include <linux/irq.h> #include <asm/mipsregs.h> #include <asm/system.h> #include <msp_cic_int.h> #include <msp_regs.h> /* * External API */ extern void msp_per_irq_init(void); extern void msp_per_irq_dispatch(void); /* * Convenience Macro. Should be somewhere generic. */ #define get_current_vpe() \ ((read_c0_tcbind() >> TCBIND_CURVPE_SHIFT) & TCBIND_CURVPE) #ifdef CONFIG_SMP #define LOCK_VPE(flags, mtflags) \ do { \ local_irq_save(flags); \ mtflags = dmt(); \ } while (0) #define UNLOCK_VPE(flags, mtflags) \ do { \ emt(mtflags); \ local_irq_restore(flags);\ } while (0) #define LOCK_CORE(flags, mtflags) \ do { \ local_irq_save(flags); \ mtflags = dvpe(); \ } while (0) #define UNLOCK_CORE(flags, mtflags) \ do { \ evpe(mtflags); \ local_irq_restore(flags);\ } while (0) #else #define LOCK_VPE(flags, mtflags) #define UNLOCK_VPE(flags, mtflags) #endif /* ensure writes to cic are completed */ static inline void cic_wmb(void) { const volatile void __iomem *cic_mem = CIC_VPE0_MSK_REG; volatile u32 dummy_read; wmb(); dummy_read = __raw_readl(cic_mem); dummy_read++; } static inline void unmask_cic_irq(unsigned int irq) { volatile u32 *cic_msk_reg = CIC_VPE0_MSK_REG; int vpe; #ifdef CONFIG_SMP unsigned int mtflags; unsigned long flags; /* * Make sure we have IRQ affinity. It may have changed while * we were processing the IRQ. */ if (!cpumask_test_cpu(smp_processor_id(), irq_desc[irq].affinity)) return; #endif vpe = get_current_vpe(); LOCK_VPE(flags, mtflags); cic_msk_reg[vpe] |= (1 << (irq - MSP_CIC_INTBASE)); UNLOCK_VPE(flags, mtflags); cic_wmb(); } static inline void mask_cic_irq(unsigned int irq) { volatile u32 *cic_msk_reg = CIC_VPE0_MSK_REG; int vpe = get_current_vpe(); #ifdef CONFIG_SMP unsigned long flags, mtflags; #endif LOCK_VPE(flags, mtflags); cic_msk_reg[vpe] &= ~(1 << (irq - MSP_CIC_INTBASE)); UNLOCK_VPE(flags, mtflags); cic_wmb(); } static inline void msp_cic_irq_ack(unsigned int irq) { mask_cic_irq(irq); /* * Only really necessary for 18, 16-14 and sometimes 3:0 * (since these can be edge sensitive) but it doesn't * hurt for the others */ *CIC_STS_REG = (1 << (irq - MSP_CIC_INTBASE)); smtc_im_ack_irq(irq); } static void msp_cic_irq_end(unsigned int irq) { if (!(irq_desc[irq].status & (IRQ_DISABLED | IRQ_INPROGRESS))) unmask_cic_irq(irq); } #ifdef CONFIG_SMP static inline int msp_cic_irq_set_affinity(unsigned int irq, const struct cpumask *cpumask) { int cpu; unsigned long flags; unsigned int mtflags; unsigned long imask = (1 << (irq - MSP_CIC_INTBASE)); volatile u32 *cic_mask = (volatile u32 *)CIC_VPE0_MSK_REG; /* timer balancing should be disabled in kernel code */ BUG_ON(irq == MSP_INT_VPE0_TIMER || irq == MSP_INT_VPE1_TIMER); LOCK_CORE(flags, mtflags); /* enable if any of each VPE's TCs require this IRQ */ for_each_online_cpu(cpu) { if (cpumask_test_cpu(cpu, cpumask)) cic_mask[cpu] |= imask; else cic_mask[cpu] &= ~imask; } UNLOCK_CORE(flags, mtflags); return 0; } #endif static struct irq_chip msp_cic_irq_controller = { .name = "MSP_CIC", .mask = msp_cic_irq_ack, .mask_ack = msp_cic_irq_ack, .unmask = unmask_cic_irq, .ack = msp_cic_irq_ack, .end = msp_cic_irq_end, #ifdef CONFIG_SMP .set_affinity = msp_cic_irq_set_affinity, #endif }; void __init msp_cic_irq_init(void) { int i; /* Mask/clear interrupts. */ *CIC_VPE0_MSK_REG = 0x00000000; *CIC_VPE1_MSK_REG = 0x00000000; *CIC_STS_REG = 0xFFFFFFFF; /* * The MSP7120 RG and EVBD boards use IRQ[6:4] for PCI. * These inputs map to EXT_INT_POL[6:4] inside the CIC. * They are to be active low, level sensitive. */ *CIC_EXT_CFG_REG &= 0xFFFF8F8F; /* initialize all the IRQ descriptors */ for (i = MSP_CIC_INTBASE ; i < MSP_CIC_INTBASE + 32 ; i++) { set_irq_chip_and_handler(i, &msp_cic_irq_controller, handle_level_irq); #ifdef CONFIG_MIPS_MT_SMTC /* Mask of CIC interrupt */ irq_hwmask[i] = C_IRQ4; #endif } /* Initialize the PER interrupt sub-system */ msp_per_irq_init(); } /* CIC masked by CIC vector processing before dispatch called */ void msp_cic_irq_dispatch(void) { volatile u32 *cic_msk_reg = (volatile u32 *)CIC_VPE0_MSK_REG; u32 cic_mask; u32 pending; int cic_status = *CIC_STS_REG; cic_mask = cic_msk_reg[get_current_vpe()]; pending = cic_status & cic_mask; if (pending & (1 << (MSP_INT_VPE0_TIMER - MSP_CIC_INTBASE))) { do_IRQ(MSP_INT_VPE0_TIMER); } else if (pending & (1 << (MSP_INT_VPE1_TIMER - MSP_CIC_INTBASE))) { do_IRQ(MSP_INT_VPE1_TIMER); } else if (pending & (1 << (MSP_INT_PER - MSP_CIC_INTBASE))) { msp_per_irq_dispatch(); } else if (pending) { do_IRQ(ffs(pending) + MSP_CIC_INTBASE - 1); } else{ spurious_interrupt(); /* Re-enable the CIC cascaded interrupt. */ irq_desc[MSP_INT_CIC].chip->end(MSP_INT_CIC); } } ^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-07 7:56 ` Anoop P A @ 2011-01-07 18:46 ` Kevin D. Kissell 2011-01-08 19:33 ` Anoop P A 2011-01-10 19:30 ` Kevin D. Kissell 1 sibling, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-07 18:46 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips On 01/06/11 23:56, Anoop P A wrote: > On Thu, 2011-01-06 at 15:31 -0800, Kevin D. Kissell wrote: >> I'm sure I've said this before, and it's in various comments in the SMTC >> code, but... As an aside to this conversation, would it be possible to create a Documentation/mips/SMTC.txt file that would actually propagate upstream, so that I'd stop being the sole repository of SMTC folklore? I only maintain it as a hobby. > Ok. Well thanks much for your detailed explanation. Well I hope I found > the root cause . smtc_clockevent_init() was overriding irq_hwmask even > if are using platform specific get_c0_compare_int. With following patch > everything seems to be working for me. > ------------------------------------------------------------------------ > diff --git a/arch/mips/kernel/cevt-smtc.c b/arch/mips/kernel/cevt-smtc.c > index 2e72d30..a25fc59 100644 > --- a/arch/mips/kernel/cevt-smtc.c > +++ b/arch/mips/kernel/cevt-smtc.c > @@ -310,9 +310,14 @@ int __cpuinit smtc_clockevent_init(void) > return 0; > /* > * And we need the hwmask associated with the c0_compare > - * vector to be initialized. > + * vector to be initialized. However incase of platform > + * specific get_co_compare_int, don't override irq_hwmask > + * expect platform code to set a valid mask value. > */ > - irq_hwmask[irq] = (0x100<< cp0_compare_irq); > + > + if (!get_c0_compare_int) > + irq_hwmask[irq] = (0x100<< cp0_compare_irq); > + > if (cp0_timer_irq_installed) > return 0; > ----------------------------------------------------------------------- I'm still not clear on one point that, to me, is pretty important when engineering a fix here. Are you, in fact, using the Count/Compare interrupt system, but having the externalization of the compare interrupt routed back through an intervening interrupt controller, or is your timer coming from another source? In the former case, I think you're on the right track as to the possible cause of a problem, but the fix should actually be simpler and rather more elegant. Why can't you simply see to it that cp0_compare_irq is set to the right value, either at compile time, or in your earliest platform initialization of the interrupt controller? That would be a one-line, inline change and spare us another cryptic conditional. In the later case, you'll presumably be having lots of other problems, as cevt-smtc.c is intertwined with cevt-r4k.c and the Count/Compare paradigm. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-07 18:46 ` Kevin D. Kissell @ 2011-01-08 19:33 ` Anoop P A 0 siblings, 0 replies; 68+ messages in thread From: Anoop P A @ 2011-01-08 19:33 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Sat, Jan 8, 2011 at 12:16 AM, Kevin D. Kissell <kevink@paralogos.com> wrote: > On 01/06/11 23:56, Anoop P A wrote: >> >> On Thu, 2011-01-06 at 15:31 -0800, Kevin D. Kissell wrote: >>> >>> I'm sure I've said this before, and it's in various comments in the SMTC >>> code, but... > > As an aside to this conversation, would it be possible to create a > Documentation/mips/SMTC.txt file that would actually propagate > upstream, so that I'd stop being the sole repository of SMTC folklore? > I only maintain it as a hobby. >> >> Ok. Well thanks much for your detailed explanation. Well I hope I found >> the root cause . smtc_clockevent_init() was overriding irq_hwmask even >> if are using platform specific get_c0_compare_int. With following patch >> everything seems to be working for me. >> ------------------------------------------------------------------------ >> diff --git a/arch/mips/kernel/cevt-smtc.c b/arch/mips/kernel/cevt-smtc.c >> index 2e72d30..a25fc59 100644 >> --- a/arch/mips/kernel/cevt-smtc.c >> +++ b/arch/mips/kernel/cevt-smtc.c >> @@ -310,9 +310,14 @@ int __cpuinit smtc_clockevent_init(void) >> return 0; >> /* >> * And we need the hwmask associated with the c0_compare >> - * vector to be initialized. >> + * vector to be initialized. However incase of platform >> + * specific get_co_compare_int, don't override irq_hwmask >> + * expect platform code to set a valid mask value. >> */ >> - irq_hwmask[irq] = (0x100<< cp0_compare_irq); >> + >> + if (!get_c0_compare_int) >> + irq_hwmask[irq] = (0x100<< cp0_compare_irq); >> + >> if (cp0_timer_irq_installed) >> return 0; >> ----------------------------------------------------------------------- > > I'm still not clear on one point that, to me, is pretty important when > engineering a fix here. Are you, in fact, using the Count/Compare > interrupt system, but having the externalization of the compare > interrupt routed back through an intervening interrupt controller, > or is your timer coming from another source? > > In the former case, I think you're on the right track as to the > possible cause of a problem, but the fix should actually be simpler > and rather more elegant. Why can't you simply see to it that > cp0_compare_irq is set to the right value, either at compile time, > or in your earliest platform initialization of the interrupt controller? > That would be a one-line, inline change and spare us another > cryptic conditional. Yes ,it is first case. http://git.linux-mips.org/?p=linux.git;a=commit;h=38760d40ca61b18b2809e9c28df8b3ff9af8a02b Above mentioned patch enables platforms to utilize 4k timer code with platform specific timer interrupts. cevt-smtc also had ( copied from cevt-r4k) referred code. Given the specific irq support in cevt-smtc we should add support for specific hwmask , IMHO. > > In the later case, you'll presumably be having lots of other problems, > as cevt-smtc.c is intertwined with cevt-r4k.c and the Count/Compare > paradigm. > > Regards, > > Kevin K. > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-07 7:56 ` Anoop P A 2011-01-07 18:46 ` Kevin D. Kissell @ 2011-01-10 19:30 ` Kevin D. Kissell 2011-01-11 4:05 ` Anoop P A 2011-01-13 7:53 ` Kevin D. Kissell 1 sibling, 2 replies; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-10 19:30 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips On 01/06/11 23:56, Anoop P A wrote: > On Thu, 2011-01-06 at 15:31 -0800, Kevin D. Kissell wrote: >> I'm sure I've said this before, and it's in various comments in the SMTC >> code, but remember, one of the main problems that the SMTC kernel >> had to solve was to prevent all TCs of a VPE from "convoying" after every >> interrupt. The way this is done is that the interrupt vector code, before >> clearing EXL, masks off the Status.IM bit associated with the incoming >> interrupt. Of course, to get another interrupt from the same source >> (or collection of sources), that IM bit needs to be restored. The "correct" >> mechanism for this is by having the appropriate irq_hwmask[] value set, >> so that smtc_im_ack_irq(), which should be invoked on an irq "ack()" >> (meaning that the source has been quenched and any new occurrence >> should be considered a new interrupt), will restore the bit in Status. >> This function got moved around a bit in the various SMTC prototypes, >> but it proved least intrusive to put it into the xxx_mask_and_ack() >> functions >> for the interrupt controllers - see irq-msc01.c and i8259.c. If you haven't >> done the same in any equivalent code for a different on-chip controller, >> you'll definitely have problems. >> >> The Backstop scheme works OK for peripheral interrupts that didn't >> have an appropriate irq_hwmask[] value set up, but clock interrupts >> don't follow the same code paths and can't depend on the backstop. > Ok. Well thanks much for your detailed explanation. Well I hope I found > the root cause . smtc_clockevent_init() was overriding irq_hwmask even > if are using platform specific get_c0_compare_int. With following patch > everything seems to be working for me. Would this still be with a "tickful" kernel? I was able to run some experiments on a Malta over the weekend, using mostly default Malta defconfig options including tickless operation. The 2.6.32.27 build comes up with both VPEs and all TCs firing. 2.6.36.2 with the stackframe.h patch boots all the way up on a single VPE, but VERY slowly - as if the Clock/Compare setups weren't being done correctly and timer intervals were waiting the full Count register rollover cycle. I've been looking at diffs, and merged one change that was made to cevt-r4k.c into the analogous routine in cevt-smtc.c (no change), but there's clearly more breakage to the SMTC/Malta configuration post-2.6.32 than just the stackframe.h patch. Going tickful may work around it, but tickful+SMTC is grossly inefficient. Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-10 19:30 ` Kevin D. Kissell @ 2011-01-11 4:05 ` Anoop P A 2011-01-13 7:53 ` Kevin D. Kissell 1 sibling, 0 replies; 68+ messages in thread From: Anoop P A @ 2011-01-11 4:05 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips On Tue, Jan 11, 2011 at 1:00 AM, Kevin D. Kissell <kevink@paralogos.com> wrote: > > Would this still be with a "tickful" kernel? I was able to run some > experiments on a Malta over the weekend, using mostly default > Malta defconfig options including tickless operation. The 2.6.32.27 > build comes up with both VPEs and all TCs firing. 2.6.36.2 with > the stackframe.h patch boots all the way up on a single VPE, but > VERY slowly - as if the Clock/Compare setups weren't being done > correctly and timer intervals were waiting the full Count register > rollover cycle. I've been looking at diffs, and merged one change > that was made to cevt-r4k.c into the analogous routine in cevt-smtc.c > (no change), but there's clearly more breakage to the SMTC/Malta > configuration post-2.6.32 than just the stackframe.h patch. Going > tickful may work around it, but tickful+SMTC is grossly inefficient. Yes that is true my configuration is using tickful . I had reported this issue with tickless kernel . I think you missed my last email. I will resend. Thanks Anoop > > Regards, > > Kevin K. > > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-10 19:30 ` Kevin D. Kissell 2011-01-11 4:05 ` Anoop P A @ 2011-01-13 7:53 ` Kevin D. Kissell 1 sibling, 0 replies; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-13 7:53 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips Further interesting data point: If I specify "nowait" on the command line, I get much better behavior on the 2.6.36 and 2.6.37 kernels. In particular, 2.6.37, which hung after the "Switching to clocksource MIPS" even booting with a single TC, gets far enough to enable swap space even with 4 TCs running. I note that there was historically a problem with getting SMTC to work with the wait-with-interrupts-disabled idle wait mode. I had it working back in 2.5.2x, but something seems to have gotten broken in that 2.6.32 to 2.6.36 interval... Regards, Kevin K. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 13:02 ` Anoop P A 2011-01-04 14:37 ` Anoop P A @ 2011-01-04 17:40 ` Kevin D. Kissell 2011-01-05 13:09 ` Anoop P A 1 sibling, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2011-01-04 17:40 UTC (permalink / raw) To: Anoop P A; +Cc: STUART VENTERS, Anoop P.A., linux-mips On 01/04/11 05:02, Anoop P A wrote: > On Tue, 2011-01-04 at 00:17 -0800, Kevin D. Kissell wrote: >> Those interrupt counters show that IPIs are being taken everywhere, >> though very few by CPUs 5 and 6. If I understand the configuration >> correctly, CPU 4 is a TC in VPE 1, and it's getting a reasonable IPI > Yes CPU4 is in second VPE > >> rate, *if* we're looking at a tickless kernel under low load. But there > No it was not the tickless kernel.I had selected 250 MHz timer. can't we > expect IPI / timer interrupt for all the threads in this case ?. In that case, you should expect a distribution of timer interrupts that favors the low-numbered TCs within the VPE, as you do in VPE0, and a distribution of IPIs that is sort-of the inverse, as you do in VPE0. But the low counts on VPE1 are indeed suspicious, as you note. >> may be a clue there to part of your problem. I have no idea why the >> behavior would have changed from 2.6.36 to 2.6.37, but it looks as if >> you're getting your clock interrupts through the MSP CIC interrupt >> controller on VPE 0. There's nothing symmetric for VPE1. The Malta >> example code is perhaps deceptively simple, in that both VPEs have their >> count/compare indication wired directly to the 2 clock interrupt inputs, >> so that having both of them running with only a single set of irq state >> just works. I don't know whether the MSP CIC timer interrupt is a > In my case it is separate irq. MSP_INT_VPE1_TIMER (34) and > MSP_INT_VPE0_TIMER (25) are wired to CIC . CIC interrupt has been > connected to cpu irq 6. > > I can reproduce cpu stall in VSMP mode If I don't setup VPE1 timer > interrupt . Don't we have support for separate irq in SMTC > implementation ?.. There are hooks for platform-specific SMTC support, which is implemented for the Malta in arch/mips/mti-malta/malta-smtc.c. See msmtc_init_secondary(), for example, where the clock/compare, profile, and IPI interrupts are armed for VPE 1, while I/O peripheral interrupts are inhibited. >> gating of the VPE0 count/compare output, or whether it's it's own >> interval timer, but I suspect that you may need to do some further >> low-level initialization in the platform-specific code to set up an >> interrupt on the VPE1 side. I don't think the snippet you've got below >> would work as written. > The routine which I copied works fine for VSMP mode . > > / # cat /proc/interrupts > CPU0 CPU1 > 0: 187 254 MIPS IPI_resched > 1: 77 174 MIPS IPI_call > 6: 0 0 MIPS MSP CIC cascade > 8: 0 0 MSP_CIC Softreset button > 9: 0 0 MSP_CIC Standby switch > 21: 0 0 MSP_CIC MSP PER cascade > 25: 37077 0 MSP_CIC timer > 27: 188 0 MSP_CIC serial > 34: 0 36986 MSP_CIC timer > > Do I want to change anything specific for SMTC ? . If it works (which I doubt), then we can critique stylistic points like using if ((1==get_current_vpe()) Instead of the more readable and general if (get_current_vpe()> 0) But I think you're generally looking in the wrong place. Look at the Malta code and see what's done where. The initial SMTC code had a lot of Malta assumptions in the main line that I pushed out to platform code in later patches. I can see how things could be made even more modular, but for the moment I think it's just that there's some stuff that ought to be done in a "msp_smtc.c" file that doesn't exist in 2.6.37. Regards, Kevin K. > > >> If it's purely an issue with clock distribution on VPE1, then a boot >> with maxvpes=1 maxtcs=4 should be stable. > Yes the kernel seems to be stable if I boot with maxvpes=1 maxtcs=4 . > >> /K. >> >> On 1/3/2011 11:20 AM, Anoop P A wrote: >>> Hi Kevin, >>> >>> On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: >>>> The very first SMTC implementations didn't support full kernel-mode >>>> preemption, which anyway wasn't a priority, given the hardware event >>>> response support in MIPS MT. I believe it was later made compatible, >>>> but it was never extensively exercised. Since SMTC has fingers in some >>>> pretty low-level atomicity mechanisms, if a new, parallel set was >>>> implemented for RCU, I can easily imagine that nobody has yet >>>> implemented SMTC-ified variants of that set. >>>> >>>> Your last statement isn't very clear, though. Are you saying that if >>>> you configure for no forced preemption and with TREE_CPU, the 2.6.37 >>>> kernel boots all the way up, or that it simply hangs later? What's the >>>> last rev kernel that actually boots all the way up? >>> I have debugged this a bit more. It seems that kernel getting stalled >>> while executing on TC's of second VPE . >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>> by 1, t=2504 jiffies) >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>> by 1, t=10036 jiffies) >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>> by 1, t=17568 jiffies) >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>> by 1, t=25100 jiffies) >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>> by 1, t=32632 jiffies) >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected >>> by 1, t=40164 jiffies) >>> >>> With CONFIG_TREE_CPU we were not hitting this scenario very often. >>> However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. >>> >>> I presume some issue in my timer setup . I am not seeing timer interrupt >>> (or IPI interrupt) getting incremented for VPE1 tcs on a completely >>> booted 2.6.32-stable kernel. >>> >>> / # cat /proc/interrupts >>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 >>> CPU6 >>> 1: 148 15023 15140 15093 3779 8 >>> 2 MIPS SMTC_IPI >>> 6: 0 0 0 0 0 0 >>> 0 MIPS MSP CIC cascade >>> 8: 0 0 0 0 0 0 >>> 0 MSP_CIC Softreset button >>> 9: 0 0 0 0 0 0 >>> 0 MSP_CIC Standby switch >>> 21: 0 0 0 0 0 0 >>> 0 MSP_CIC MSP PER cascade >>> 25: 15113 341 4 7 0 0 >>> 0 MSP_CIC timer >>> 27: 260 9 0 1 0 0 >>> 0 MSP_CIC serial >>> 34: 0 0 0 0 0 0 >>> 0 MSP_CIC timer >>> >>> Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. >>> >>> I have tried setting up VPE1 timer from get_co_compare_int as follows >>> >>> unsigned int __cpuinit get_c0_compare_int(void) >>> { >>> if ((1==get_current_vpe())&& !vpe1_timr_installed){ >>> >>> memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); >>> >>> setup_irq(MSP_INT_VPE1_TIMER,&timer_vpe1); >>> vpe1_timr_installed++; >>> } >>> return (get_current_vpe() ? MSP_INT_VPE1_TIMER : >>> MSP_INT_VPE0_TIMER); >>> } >>> >>> Thanks >>> Anoop ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2011-01-04 17:40 ` Kevin D. Kissell @ 2011-01-05 13:09 ` Anoop P A 0 siblings, 0 replies; 68+ messages in thread From: Anoop P A @ 2011-01-05 13:09 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: STUART VENTERS, Anoop P.A., linux-mips [-- Attachment #1: Type: text/plain, Size: 7962 bytes --] On Tue, 2011-01-04 at 09:40 -0800, Kevin D. Kissell wrote: > On 01/04/11 05:02, Anoop P A wrote: > > On Tue, 2011-01-04 at 00:17 -0800, Kevin D. Kissell wrote: > >> Those interrupt counters show that IPIs are being taken everywhere, > >> though very few by CPUs 5 and 6. If I understand the configuration > >> correctly, CPU 4 is a TC in VPE 1, and it's getting a reasonable IPI > > Yes CPU4 is in second VPE > > > >> rate, *if* we're looking at a tickless kernel under low load. But there > > No it was not the tickless kernel.I had selected 250 MHz timer. can't we > > expect IPI / timer interrupt for all the threads in this case ?. > > In that case, you should expect a distribution of timer interrupts that > favors the low-numbered TCs within the VPE, as you do in VPE0, and a > distribution of IPIs that is sort-of the inverse, as you do in VPE0. > But the low counts on VPE1 are indeed suspicious, as you note. > > >> may be a clue there to part of your problem. I have no idea why the > >> behavior would have changed from 2.6.36 to 2.6.37, but it looks as if > >> you're getting your clock interrupts through the MSP CIC interrupt > >> controller on VPE 0. There's nothing symmetric for VPE1. The Malta > >> example code is perhaps deceptively simple, in that both VPEs have their > >> count/compare indication wired directly to the 2 clock interrupt inputs, > >> so that having both of them running with only a single set of irq state > >> just works. I don't know whether the MSP CIC timer interrupt is a > > In my case it is separate irq. MSP_INT_VPE1_TIMER (34) and > > MSP_INT_VPE0_TIMER (25) are wired to CIC . CIC interrupt has been > > connected to cpu irq 6. > > > > I can reproduce cpu stall in VSMP mode If I don't setup VPE1 timer > > interrupt . Don't we have support for separate irq in SMTC > > implementation ?.. > > There are hooks for platform-specific SMTC support, which is implemented > for the Malta in arch/mips/mti-malta/malta-smtc.c. See > msmtc_init_secondary(), for example, where the clock/compare, profile, > and IPI interrupts are armed for VPE 1, while I/O peripheral interrupts > are inhibited. > > >> gating of the VPE0 count/compare output, or whether it's it's own > >> interval timer, but I suspect that you may need to do some further > >> low-level initialization in the platform-specific code to set up an > >> interrupt on the VPE1 side. I don't think the snippet you've got below > >> would work as written. > > The routine which I copied works fine for VSMP mode . > > > > / # cat /proc/interrupts > > CPU0 CPU1 > > 0: 187 254 MIPS IPI_resched > > 1: 77 174 MIPS IPI_call > > 6: 0 0 MIPS MSP CIC cascade > > 8: 0 0 MSP_CIC Softreset button > > 9: 0 0 MSP_CIC Standby switch > > 21: 0 0 MSP_CIC MSP PER cascade > > 25: 37077 0 MSP_CIC timer > > 27: 188 0 MSP_CIC serial > > 34: 0 36986 MSP_CIC timer > > > > Do I want to change anything specific for SMTC ? . > > If it works (which I doubt), then we can critique stylistic points like > using > > if ((1==get_current_vpe()) > > Instead of the more readable and general > > if (get_current_vpe()> 0) > > > But I think you're generally looking in the wrong place. Look at the > Malta code and see what's done where. The initial SMTC code had a lot > of Malta assumptions in the main line that I pushed out to platform code > in later patches. I can see how things could be made even more modular, > but for the moment I think it's just that there's some stuff that ought > to be done in a "msp_smtc.c" file that doesn't exist in 2.6.37. Yes , I am doing similar stuff in msp_smtc.c . Attaching code for your reference. I am not seeing a VPE1 timer interrupt. > > Regards, > > Kevin K. > > > > > >> If it's purely an issue with clock distribution on VPE1, then a boot > >> with maxvpes=1 maxtcs=4 should be stable. > > Yes the kernel seems to be stable if I boot with maxvpes=1 maxtcs=4 . > > > >> /K. > >> > >> On 1/3/2011 11:20 AM, Anoop P A wrote: > >>> Hi Kevin, > >>> > >>> On Mon, 2011-01-03 at 08:14 -0800, Kevin D. Kissell wrote: > >>>> The very first SMTC implementations didn't support full kernel-mode > >>>> preemption, which anyway wasn't a priority, given the hardware event > >>>> response support in MIPS MT. I believe it was later made compatible, > >>>> but it was never extensively exercised. Since SMTC has fingers in some > >>>> pretty low-level atomicity mechanisms, if a new, parallel set was > >>>> implemented for RCU, I can easily imagine that nobody has yet > >>>> implemented SMTC-ified variants of that set. > >>>> > >>>> Your last statement isn't very clear, though. Are you saying that if > >>>> you configure for no forced preemption and with TREE_CPU, the 2.6.37 > >>>> kernel boots all the way up, or that it simply hangs later? What's the > >>>> last rev kernel that actually boots all the way up? > >>> I have debugged this a bit more. It seems that kernel getting stalled > >>> while executing on TC's of second VPE . > >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>> by 1, t=2504 jiffies) > >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>> by 1, t=10036 jiffies) > >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>> by 1, t=17568 jiffies) > >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>> by 1, t=25100 jiffies) > >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>> by 1, t=32632 jiffies) > >>> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 4 5 6} (detected > >>> by 1, t=40164 jiffies) > >>> > >>> With CONFIG_TREE_CPU we were not hitting this scenario very often. > >>> However with CONFIG_PREEMPT_TREE_CPU stall happens most of the time. > >>> > >>> I presume some issue in my timer setup . I am not seeing timer interrupt > >>> (or IPI interrupt) getting incremented for VPE1 tcs on a completely > >>> booted 2.6.32-stable kernel. > >>> > >>> / # cat /proc/interrupts > >>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >>> CPU6 > >>> 1: 148 15023 15140 15093 3779 8 > >>> 2 MIPS SMTC_IPI > >>> 6: 0 0 0 0 0 0 > >>> 0 MIPS MSP CIC cascade > >>> 8: 0 0 0 0 0 0 > >>> 0 MSP_CIC Softreset button > >>> 9: 0 0 0 0 0 0 > >>> 0 MSP_CIC Standby switch > >>> 21: 0 0 0 0 0 0 > >>> 0 MSP_CIC MSP PER cascade > >>> 25: 15113 341 4 7 0 0 > >>> 0 MSP_CIC timer > >>> 27: 260 9 0 1 0 0 > >>> 0 MSP_CIC serial > >>> 34: 0 0 0 0 0 0 > >>> 0 MSP_CIC timer > >>> > >>> Can't we use separate timer interrupts for VPE1 and VPE0 in SMTC ?. > >>> > >>> I have tried setting up VPE1 timer from get_co_compare_int as follows > >>> > >>> unsigned int __cpuinit get_c0_compare_int(void) > >>> { > >>> if ((1==get_current_vpe())&& !vpe1_timr_installed){ > >>> > >>> memcpy(&timer_vpe1,&c0_compare_irqaction,sizeof(timer_vpe1)); > >>> > >>> setup_irq(MSP_INT_VPE1_TIMER,&timer_vpe1); > >>> vpe1_timr_installed++; > >>> } > >>> return (get_current_vpe() ? MSP_INT_VPE1_TIMER : > >>> MSP_INT_VPE0_TIMER); > >>> } > >>> > >>> Thanks > >>> Anoop > [-- Attachment #2: msp_smtc.c --] [-- Type: text/x-csrc, Size: 3230 bytes --] /* * MSP71xx Platform-specific hooks for SMP operation. * Started from malta-smtc.c. */ #include <linux/irq.h> #include <linux/init.h> #include <linux/sched.h> #include <asm/mipsregs.h> #include <asm/mipsmtregs.h> #include <asm/smtc.h> #include <asm/smtc_ipi.h> /* VPE/SMP Prototype implements platform interfaces directly */ /* * Cause the specified action to be performed on a targeted "CPU" */ static void msp_smtc_send_ipi_single(int cpu, unsigned int action) { /* "CPU" may be TC of same VPE, VPE of same CPU, or different CPU */ smtc_send_ipi(cpu, LINUX_SMP_IPI, action); } static void msp_smtc_send_ipi_mask(const struct cpumask *mask, unsigned int action) { unsigned int i; for_each_cpu(i, mask) msp_smtc_send_ipi_single(i, action); } /* * Post-config but pre-boot cleanup entry point */ static int prev_vpe; static void __cpuinit msp_smtc_init_secondary(void) { void smtc_init_secondary(void); int myvpe; myvpe = read_c0_tcbind() & TCBIND_CURVPE; /* Change status register when we switch to new VPE*/ if ((myvpe != prev_vpe) && (myvpe > 0)) { change_c0_status(ST0_IM, STATUSF_IP0 | STATUSF_IP1 | STATUSF_IP6 | STATUSF_IP7); } prev_vpe = myvpe; smtc_init_secondary(); } /* * Platform "CPU" startup hook */ static void __cpuinit msp_smtc_boot_secondary(int cpu, struct task_struct *idle) { smtc_boot_secondary(cpu, idle); } /* * SMP initialization finalization entry point */ static void __cpuinit msp_smtc_smp_finish(void) { smtc_smp_finish(); } /* * Hook for after all CPUs are online */ static void msp_smtc_cpus_done(void) { } /* * Platform SMP pre-initialization * * As noted above, we can assume a single CPU for now * but it may be multithreaded. */ static void __init msp_smtc_smp_setup(void) { /* * we won't get the definitive value until * we've run smtc_prepare_cpus later, but */ if (read_c0_config3() & (1 << 2)) smp_num_siblings = smtc_build_cpu_map(0); } static void __init msp_smtc_prepare_cpus(unsigned int max_cpus) { smtc_prepare_cpus(max_cpus); } struct plat_smp_ops msp_smtc_smp_ops = { .send_ipi_single = msp_smtc_send_ipi_single, .send_ipi_mask = msp_smtc_send_ipi_mask, .init_secondary = msp_smtc_init_secondary, .smp_finish = msp_smtc_smp_finish, .cpus_done = msp_smtc_cpus_done, .boot_secondary = msp_smtc_boot_secondary, .smp_setup = msp_smtc_smp_setup, .prepare_cpus = msp_smtc_prepare_cpus, }; #if 0 /* TODO */ #ifdef CONFIG_MIPS_MT_SMTC_IRQAFF /* * IRQ affinity hook */ int plat_set_irq_affinity(unsigned int irq, const struct cpumask *affinity) { cpumask_t tmask; int cpu = 0; void smtc_set_irq_affinity(unsigned int irq, cpumask_t aff); cpumask_copy(&tmask, affinity); for_each_cpu(cpu, affinity) { if ((cpu_data[cpu].vpe_id != 0) || !cpu_online(cpu)) cpu_clear(cpu, tmask); } cpumask_copy(irq_desc[irq].affinity, &tmask); if (cpus_empty(tmask)) /* * We could restore a default mask here, but the * runtime code can anyway deal with the null set */ printk(KERN_WARNING "IRQ affinity leaves no legal CPU for IRQ %d\n", irq); /* Do any generic SMTC IRQ affinity setup */ smtc_set_irq_affinity(irq, tmask); return 0; } #endif /* CONFIG_MIPS_MT_SMTC_IRQAFF */ #endif ^ permalink raw reply [flat|nested] 68+ messages in thread
* SMTC support status in latest git head. @ 2010-12-08 13:48 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-08 13:48 UTC (permalink / raw) To: linux-mips Hi list, Any body is aware of SMTC support status in latest git sources?. I have tried testing SMTC kernel for malta in qemu / OVP without any success ( emulators not working for 34k). I am trying to bring up SMTC Linux support for an mips34K based soc ( MSP71xx family). While booting , kernel getting hung on calibrate loop delay. I am getting only one interrupt from timer. With similar smtc platform support file ( changed to map smp_ops structure) 2.6.24-stable branch kernel ( where latest timer structure introduced) boots fine. [ 0.000000] Linux version 2.6.37-rc1-pmc-00197-g5bfd3ba-dirty (paanoop1@paanoop1-desktop) (gcc version 4.5.1 (GCC) ) #168 SMP PREEMPT Wed Dec 8 19:19:490 [ 0.000000] DSPRAM0: PA=1c100000,Size=00008000,enabled [ 0.000000] UART clock set to 50000000 [ 0.000000] CPU revision is: 00019548 (MIPS 34Kc) [ 0.000000] Determined physical RAM map: [ 0.000000] memory: 00001000 @ 00000000 (reserved) [ 0.000000] memory: 000ff000 @ 00001000 (usable) [ 0.000000] memory: 003f2000 @ 00100000 (reserved) [ 0.000000] memory: 0fad9200 @ 004f2000 (usable) [ 0.000000] Wasting 32 bytes for tracking 1 unused pages [ 0.000000] Zone PFN ranges: [ 0.000000] Normal 0x00000000 -> 0x0000ffcb [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[1] active PFN ranges [ 0.000000] 0: 0x00000000 -> 0x0000ffcb [ 0.000000] 6 available secondary CPU TC(s) [ 0.000000] PERCPU: Embedded 7 pages/cpu @81203000 s6464 r8192 d14016 u32768 [ 0.000000] pcpu-alloc: s6464 r8192 d14016 u32768 alloc=8*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64971 [ 0.000000] Kernel command line: console=ttyS0,57600 [ 0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes) [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) [ 0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. [ 0.000000] Primary data cache 64kB, 4-way, PIPT, no aliases, linesize 32 bytes [ 0.000000] Writing ErrCtl register=00000000 [ 0.000000] Readback ErrCtl register=00000000 [ 0.000000] Memory: 254360k/257888k available (3081k kernel code, 3528k reserved, 653k data, 200k init, 0k highmem) [ 0.000000] Preemptable hierarchical RCU implementation. [ 0.000000] NR_IRQS:128 [ 0.000000] console [ttyS0] enabled [ 0.000000] Clock rate set to 600000000 [ 0.000000] Calibrating delay loop... Any idea to debug the issue ?. Thanks, Anoop ^ permalink raw reply [flat|nested] 68+ messages in thread
* SMTC support status in latest git head. @ 2010-12-08 13:48 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-08 13:48 UTC (permalink / raw) To: linux-mips Hi list, Any body is aware of SMTC support status in latest git sources?. I have tried testing SMTC kernel for malta in qemu / OVP without any success ( emulators not working for 34k). I am trying to bring up SMTC Linux support for an mips34K based soc ( MSP71xx family). While booting , kernel getting hung on calibrate loop delay. I am getting only one interrupt from timer. With similar smtc platform support file ( changed to map smp_ops structure) 2.6.24-stable branch kernel ( where latest timer structure introduced) boots fine. [ 0.000000] Linux version 2.6.37-rc1-pmc-00197-g5bfd3ba-dirty (paanoop1@paanoop1-desktop) (gcc version 4.5.1 (GCC) ) #168 SMP PREEMPT Wed Dec 8 19:19:490 [ 0.000000] DSPRAM0: PA=1c100000,Size=00008000,enabled [ 0.000000] UART clock set to 50000000 [ 0.000000] CPU revision is: 00019548 (MIPS 34Kc) [ 0.000000] Determined physical RAM map: [ 0.000000] memory: 00001000 @ 00000000 (reserved) [ 0.000000] memory: 000ff000 @ 00001000 (usable) [ 0.000000] memory: 003f2000 @ 00100000 (reserved) [ 0.000000] memory: 0fad9200 @ 004f2000 (usable) [ 0.000000] Wasting 32 bytes for tracking 1 unused pages [ 0.000000] Zone PFN ranges: [ 0.000000] Normal 0x00000000 -> 0x0000ffcb [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[1] active PFN ranges [ 0.000000] 0: 0x00000000 -> 0x0000ffcb [ 0.000000] 6 available secondary CPU TC(s) [ 0.000000] PERCPU: Embedded 7 pages/cpu @81203000 s6464 r8192 d14016 u32768 [ 0.000000] pcpu-alloc: s6464 r8192 d14016 u32768 alloc=8*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64971 [ 0.000000] Kernel command line: console=ttyS0,57600 [ 0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes) [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) [ 0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. [ 0.000000] Primary data cache 64kB, 4-way, PIPT, no aliases, linesize 32 bytes [ 0.000000] Writing ErrCtl register=00000000 [ 0.000000] Readback ErrCtl register=00000000 [ 0.000000] Memory: 254360k/257888k available (3081k kernel code, 3528k reserved, 653k data, 200k init, 0k highmem) [ 0.000000] Preemptable hierarchical RCU implementation. [ 0.000000] NR_IRQS:128 [ 0.000000] console [ttyS0] enabled [ 0.000000] Clock rate set to 600000000 [ 0.000000] Calibrating delay loop... Any idea to debug the issue ?. Thanks, Anoop ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-08 13:48 ` Anoop P.A. (?) @ 2010-12-09 17:07 ` Ralf Baechle -1 siblings, 0 replies; 68+ messages in thread From: Ralf Baechle @ 2010-12-09 17:07 UTC (permalink / raw) To: Anoop P.A.; +Cc: linux-mips, Kevin D. Kissell On Wed, Dec 08, 2010 at 05:48:48AM -0800, Anoop P.A. wrote: > Any body is aware of SMTC support status in latest git sources?. I have tried testing SMTC kernel for malta in qemu / OVP without any success ( emulators not working for 34k). Correct. MTI's MIPSsim is the only simulator that supports multithreading afaik. SMTC is not terribly popular so doesn't receive the regular testing it should because it's also a complex beast. > I am trying to bring up SMTC Linux support for an mips34K based soc ( MSP71xx family). > > While booting , kernel getting hung on calibrate loop delay. I am getting only one interrupt from timer. With similar smtc platform support file ( changed to map smp_ops structure) 2.6.24-stable branch kernel ( where latest timer structure introduced) boots fine. Timer interrupts work differently in SMTC. Each CPU needs a clock event device, that is an interrupt timer but the CPU core is restricted to just one per VPE so in typical SMTC setup multiple CPUs aka TCs will have to share an interrupt timer. The way this works is that one of the TCs associated with a VPE will take the timer interrupt and forward it to the other TCs associated with the same VPE (if any) through a software IPI mechanism. The race conditions that need to handled to make this work are ... interesting. Your problem seems to be simpler as you only get a single timer interrupt. Ralf ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-08 13:48 ` Anoop P.A. (?) (?) @ 2010-12-09 18:52 ` Kevin D. Kissell 2010-12-14 15:25 ` Anoop P.A. -1 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-09 18:52 UTC (permalink / raw) To: Anoop P.A.; +Cc: linux-mips I used to do occasional tests and damage control patches for SMTC, but haven't had the time and resources for the past year or so. The "Calibrating delay loop" hang is an absolutely classic hang in SMTC systems that stems from the interrupt management system not being properly set up. Ralf alluded to the intra-TC timer propagation protocol, but your problem could just as easily (more easily, actually) have to do with enable mask management. In order to keep multiple threads from "convoying" into interrupt handlers chasing a single event, SMTC manipulates the interrupt enable mask at entry into an interrupt exception to ensure that only the initial TC goes after it. The interrupt is unmasked once the interrupt handler has quenched the source and invoked the IRQ ack function. Unfortunately, generic timer functions don't always do the canonical source quench performed by most device driver interrupt handlers. I tried to make all this self-contained in generic architecture-specific code, but at some point it ended up being cleaner and more efficient to have *some* hooks in platform specific timer code. It was there for Malta in the kernel.org mainline once upon a time, and I *thought* we'd propagated working code for the initial PMC-Sierra 34K-based SoC's at least as far as linux-mips.org, but the source tree has been considerably reorganized - there was a time when some of the hooks were under arch/mips/mips-boards/generic, which no longer exists - and I'm not sure where to point you. Git and grep are your friends. The first order of business is to break into that hung timer calibration loop and dump the CP0 registers for the VPE and the TCs, in particular checking the interrupt enable mask in Status against the pending interrupts in the Cause register. If you're seeing the timer interrupt's bit set in Cause, but clear in Status, you need to fix the SMTC interrupt mask hook for your platform timer. If that's *not* it, check to see if you're building for "tickless" operation. Tickless ends up being really important for SMTC, and I did get it working properly back in 2008, but I the SMTC-specific cevt-smtc.c code uses common functions in cevt-r4k.c, and I've seen some patches to cevt-r4k.c going by that I rather doubt were ever tested against an SMTC build/platform. There might have been breakage there, and configuring to use a fixed interval timer (say, 100Hz) would be a way to test that hypothesis. Regards, Kevin K. On 12/08/10 05:48, Anoop P.A. wrote: > Hi list, > > Any body is aware of SMTC support status in latest git sources?. I have tried testing SMTC kernel for malta in qemu / OVP without any success ( emulators not working for 34k). > > I am trying to bring up SMTC Linux support for an mips34K based soc ( MSP71xx family). > > While booting , kernel getting hung on calibrate loop delay. I am getting only one interrupt from timer. With similar smtc platform support file ( changed to map smp_ops structure) 2.6.24-stable branch kernel ( where latest timer structure introduced) boots fine. > > [ 0.000000] Linux version 2.6.37-rc1-pmc-00197-g5bfd3ba-dirty (paanoop1@paanoop1-desktop) (gcc version 4.5.1 (GCC) ) #168 SMP PREEMPT Wed Dec 8 19:19:490 > [ 0.000000] DSPRAM0: PA=1c100000,Size=00008000,enabled > [ 0.000000] UART clock set to 50000000 > [ 0.000000] CPU revision is: 00019548 (MIPS 34Kc) > [ 0.000000] Determined physical RAM map: > [ 0.000000] memory: 00001000 @ 00000000 (reserved) > [ 0.000000] memory: 000ff000 @ 00001000 (usable) > [ 0.000000] memory: 003f2000 @ 00100000 (reserved) > [ 0.000000] memory: 0fad9200 @ 004f2000 (usable) > [ 0.000000] Wasting 32 bytes for tracking 1 unused pages > [ 0.000000] Zone PFN ranges: > [ 0.000000] Normal 0x00000000 -> 0x0000ffcb > [ 0.000000] Movable zone start PFN for each node > [ 0.000000] early_node_map[1] active PFN ranges > [ 0.000000] 0: 0x00000000 -> 0x0000ffcb > [ 0.000000] 6 available secondary CPU TC(s) > [ 0.000000] PERCPU: Embedded 7 pages/cpu @81203000 s6464 r8192 d14016 u32768 > [ 0.000000] pcpu-alloc: s6464 r8192 d14016 u32768 alloc=8*4096 > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 > [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64971 > [ 0.000000] Kernel command line: console=ttyS0,57600 > [ 0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes) > [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) > [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) > [ 0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. > [ 0.000000] Primary data cache 64kB, 4-way, PIPT, no aliases, linesize 32 bytes > [ 0.000000] Writing ErrCtl register=00000000 > [ 0.000000] Readback ErrCtl register=00000000 > [ 0.000000] Memory: 254360k/257888k available (3081k kernel code, 3528k reserved, 653k data, 200k init, 0k highmem) > [ 0.000000] Preemptable hierarchical RCU implementation. > [ 0.000000] NR_IRQS:128 > [ 0.000000] console [ttyS0] enabled > [ 0.000000] Clock rate set to 600000000 > [ 0.000000] Calibrating delay loop... > > Any idea to debug the issue ?. > > Thanks, > Anoop > > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-14 15:25 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-14 15:25 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: linux-mips > it ended up being cleaner and more efficient to have *some* hooks in > platform specific timer code. It was there for Malta in the kernel.org > mainline once upon a time, and I *thought* we'd propagated working code > for the initial PMC-Sierra 34K-based SoC's at least as far as [Anoop P.A.] I was able to boot 2.6.24-7 git sources with a change in cevt-r4k.c ( c0_compare_int_pending changed as following "return (read_c0_cause() >> cp0_compare_irq_shift) & (1ul << CAUSEB_IP)" > linux-mips.org, but the source tree has been considerably reorganized - > there was a time when some of the hooks were under > arch/mips/mips-boards/generic, which no longer exists - and I'm not sure > where to point you. Git and grep are your friends. [Anoop P.A.]malta code has been moved to arch/mips/mti-malta/ Can you recollect the version of l-m-o kernel with a known working SMTC support ?. > > The first order of business is to break into that hung timer calibration > loop and dump the CP0 registers for the VPE and the TCs, in particular > checking the interrupt enable mask in Status against the pending > interrupts in the Cause register. If you're seeing the timer > interrupt's bit set in Cause, but clear in Status, you need to fix the > SMTC interrupt mask hook for your platform timer. [Anoop P.A.] I tried dumping registers from calibration while loop. It looks like the timer interrupt bit stay high on both cause and status register ( in my case timer interrupt is connected to Cascaded CIC interrupt which is connected to irq -6 ( C_IRQ4)). Detailed log pasted below > check to see if you're building for "tickless" operation. Tickless ends > up being really important for SMTC, and I did get it working properly > back in 2008, but I the SMTC-specific cevt-smtc.c code uses common > functions in cevt-r4k.c, and I've seen some patches to cevt-r4k.c going > by that I rather doubt were ever tested against an SMTC build/platform. > There might have been breakage there, and configuring to use a fixed > interval timer (say, 100Hz) would be a way to test that hypothesis. [Anoop P.A.] I have tried both tickles and fixed interval timer. > > Regards, > > Kevin K. [Anoop P.A.] Thanks much for your and Ralf's detailed response. > [Anoop P.A.] [ 0.000000] Writing ErrCtl register=00000000 [ 0.000000] Readback ErrCtl register=00000000 [ 0.000000] Memory: 254384k/257912k available (3062k kernel code, 3528k reserved, 648k data, 200k init, 0k highmem) [ 0.000000] Preemptable hierarchical RCU implementation. [ 0.000000] NR_IRQS:128 [ 0.000000] console [ttyS0] enabled [ 0.000000] Clock rate set to 600000000 [ 0.000000] Calibrating delay loop... === MIPS MT State Dump === [ 0.000000] -- Global State -- [ 0.000000] MVPControl Passed: 00000000 [ 0.000000] MVPControl Read: 00000000 [ 0.000000] MVPConf0 : a8008406 [ 0.000000] -- per-VPE State -- [ 0.000000] VPE 0 [ 0.000000] VPEControl : 00000000 [ 0.000000] VPEConf0 : 800f0003 [ 0.000000] VPE0.Status : 11004001 [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 [ 0.000000] VPE0.Cause : 40804000 [ 0.000000] VPE0.Config7 : 00010000 [ 0.000000] VPE 1 [ 0.000000] VPEControl : 00060000 [ 0.000000] VPEConf0 : 800f0000 [ 0.000000] VPE1.Status : 00408305 [ 0.000000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.000000] VPE1.Cause : 40000200 [ 0.000000] VPE1.Config7 : 00010000 [ 0.000000] -- per-TC State -- [ 0.000000] TC 0 (current TC with VPE EPC above) [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00000000 [ 0.000000] TCRestart : 8010d860 mips_mt_regdump+0x2f0/0x3c4 [ 0.000000] TCHalt : 00000000 [ 0.000000] TCContext : 00000000 [ 0.000000] TC 1 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00200001 [ 0.000000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00180000 [ 0.000000] TC 2 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00400001 [ 0.000000] TCRestart : 7ffffffc 0x7ffffffc [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00300000 [ 0.000000] TC 3 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00600001 [ 0.000000] TCRestart : fff7ffae 0xfff7ffae [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00480000 [ 0.000000] TC 4 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00800001 [ 0.000000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00600000 [ 0.000000] TC 5 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00a00001 [ 0.000000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00780000 [ 0.000000] TC 6 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00c00001 [ 0.000000] TCRestart : ffff7ffe 0xffff7ffe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00900000 [ 0.000000] Counter Interrupts taken per CPU (TC) [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] 2: 0 [ 0.000000] 3: 0 [ 0.000000] 4: 0 [ 0.000000] 5: 0 [ 0.000000] 6: 0 [ 0.000000] 7: 0 [ 0.000000] Self-IPI invocations: [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] 2: 0 [ 0.000000] 3: 0 [ 0.000000] 4: 0 [ 0.000000] 5: 0 [ 0.000000] 6: 0 [ 0.000000] 7: 0 [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] 0 Recoveries of "stolen" FPU [ 0.000000] =========================== [ 0.000000] In platform cic dispatch cic_mask=0x22000 stat=0x2402000f pend=0x20000 [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 [ 0.010000] VPE0.Cause : 40804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00060000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00408305 [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.010000] VPE1.Cause : 40000200 [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00180000 [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00300000 [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : fff7ffae 0xfff7ffae [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00480000 [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00600000 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00780000 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00900000 [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU [ 0.010000] =========================== [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 [ 0.010000] VPE0.Cause : 40804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00060000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00408305 [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.010000] VPE1.Cause : 40000200 [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00180000 [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00300000 [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : fff7ffae 0xfff7ffae [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00480000 [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00600000 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00780000 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00900000 [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU [ 0.010000] =========================== [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 [ 0.010000] VPE0.Cause : 40804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00060000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00408305 [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.010000] VPE1.Cause : 40000200 [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00180000 [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00300000 [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : fff7ffae 0xfff7ffae [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00480000 [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00600000 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00780000 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00900000 [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU ^ permalink raw reply [flat|nested] 68+ messages in thread
* RE: SMTC support status in latest git head. @ 2010-12-14 15:25 ` Anoop P.A. 0 siblings, 0 replies; 68+ messages in thread From: Anoop P.A. @ 2010-12-14 15:25 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: linux-mips > it ended up being cleaner and more efficient to have *some* hooks in > platform specific timer code. It was there for Malta in the kernel.org > mainline once upon a time, and I *thought* we'd propagated working code > for the initial PMC-Sierra 34K-based SoC's at least as far as [Anoop P.A.] I was able to boot 2.6.24-7 git sources with a change in cevt-r4k.c ( c0_compare_int_pending changed as following "return (read_c0_cause() >> cp0_compare_irq_shift) & (1ul << CAUSEB_IP)" > linux-mips.org, but the source tree has been considerably reorganized - > there was a time when some of the hooks were under > arch/mips/mips-boards/generic, which no longer exists - and I'm not sure > where to point you. Git and grep are your friends. [Anoop P.A.]malta code has been moved to arch/mips/mti-malta/ Can you recollect the version of l-m-o kernel with a known working SMTC support ?. > > The first order of business is to break into that hung timer calibration > loop and dump the CP0 registers for the VPE and the TCs, in particular > checking the interrupt enable mask in Status against the pending > interrupts in the Cause register. If you're seeing the timer > interrupt's bit set in Cause, but clear in Status, you need to fix the > SMTC interrupt mask hook for your platform timer. [Anoop P.A.] I tried dumping registers from calibration while loop. It looks like the timer interrupt bit stay high on both cause and status register ( in my case timer interrupt is connected to Cascaded CIC interrupt which is connected to irq -6 ( C_IRQ4)). Detailed log pasted below > check to see if you're building for "tickless" operation. Tickless ends > up being really important for SMTC, and I did get it working properly > back in 2008, but I the SMTC-specific cevt-smtc.c code uses common > functions in cevt-r4k.c, and I've seen some patches to cevt-r4k.c going > by that I rather doubt were ever tested against an SMTC build/platform. > There might have been breakage there, and configuring to use a fixed > interval timer (say, 100Hz) would be a way to test that hypothesis. [Anoop P.A.] I have tried both tickles and fixed interval timer. > > Regards, > > Kevin K. [Anoop P.A.] Thanks much for your and Ralf's detailed response. > [Anoop P.A.] [ 0.000000] Writing ErrCtl register=00000000 [ 0.000000] Readback ErrCtl register=00000000 [ 0.000000] Memory: 254384k/257912k available (3062k kernel code, 3528k reserved, 648k data, 200k init, 0k highmem) [ 0.000000] Preemptable hierarchical RCU implementation. [ 0.000000] NR_IRQS:128 [ 0.000000] console [ttyS0] enabled [ 0.000000] Clock rate set to 600000000 [ 0.000000] Calibrating delay loop... === MIPS MT State Dump === [ 0.000000] -- Global State -- [ 0.000000] MVPControl Passed: 00000000 [ 0.000000] MVPControl Read: 00000000 [ 0.000000] MVPConf0 : a8008406 [ 0.000000] -- per-VPE State -- [ 0.000000] VPE 0 [ 0.000000] VPEControl : 00000000 [ 0.000000] VPEConf0 : 800f0003 [ 0.000000] VPE0.Status : 11004001 [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 [ 0.000000] VPE0.Cause : 40804000 [ 0.000000] VPE0.Config7 : 00010000 [ 0.000000] VPE 1 [ 0.000000] VPEControl : 00060000 [ 0.000000] VPEConf0 : 800f0000 [ 0.000000] VPE1.Status : 00408305 [ 0.000000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.000000] VPE1.Cause : 40000200 [ 0.000000] VPE1.Config7 : 00010000 [ 0.000000] -- per-TC State -- [ 0.000000] TC 0 (current TC with VPE EPC above) [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00000000 [ 0.000000] TCRestart : 8010d860 mips_mt_regdump+0x2f0/0x3c4 [ 0.000000] TCHalt : 00000000 [ 0.000000] TCContext : 00000000 [ 0.000000] TC 1 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00200001 [ 0.000000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00180000 [ 0.000000] TC 2 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00400001 [ 0.000000] TCRestart : 7ffffffc 0x7ffffffc [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00300000 [ 0.000000] TC 3 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00600001 [ 0.000000] TCRestart : fff7ffae 0xfff7ffae [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00480000 [ 0.000000] TC 4 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00800001 [ 0.000000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00600000 [ 0.000000] TC 5 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00a00001 [ 0.000000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00780000 [ 0.000000] TC 6 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00c00001 [ 0.000000] TCRestart : ffff7ffe 0xffff7ffe [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00900000 [ 0.000000] Counter Interrupts taken per CPU (TC) [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] 2: 0 [ 0.000000] 3: 0 [ 0.000000] 4: 0 [ 0.000000] 5: 0 [ 0.000000] 6: 0 [ 0.000000] 7: 0 [ 0.000000] Self-IPI invocations: [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] 2: 0 [ 0.000000] 3: 0 [ 0.000000] 4: 0 [ 0.000000] 5: 0 [ 0.000000] 6: 0 [ 0.000000] 7: 0 [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] 0 Recoveries of "stolen" FPU [ 0.000000] =========================== [ 0.000000] In platform cic dispatch cic_mask=0x22000 stat=0x2402000f pend=0x20000 [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 [ 0.010000] VPE0.Cause : 40804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00060000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00408305 [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.010000] VPE1.Cause : 40000200 [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00180000 [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00300000 [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : fff7ffae 0xfff7ffae [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00480000 [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00600000 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00780000 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00900000 [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU [ 0.010000] =========================== [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 [ 0.010000] VPE0.Cause : 40804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00060000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00408305 [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.010000] VPE1.Cause : 40000200 [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00180000 [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00300000 [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : fff7ffae 0xfff7ffae [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00480000 [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00600000 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00780000 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00900000 [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU [ 0.010000] =========================== [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 [ 0.010000] VPE0.Cause : 40804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00060000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00408305 [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 [ 0.010000] VPE1.Cause : 40000200 [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00180000 [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00300000 [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : fff7ffae 0xfff7ffae [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00480000 [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00600000 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00780000 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00900000 [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-14 15:25 ` Anoop P.A. (?) @ 2010-12-14 18:32 ` Kevin D. Kissell 2010-12-14 18:50 ` Ralf Baechle 2010-12-15 19:18 ` Anoop P A -1 siblings, 2 replies; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-14 18:32 UTC (permalink / raw) To: Anoop P.A.; +Cc: linux-mips Between your mailer and mine (Thunderbird 3.1 on Ubuntu), the quoting has become something of a dogs breakfast, so let me just lay things out here as best I can. I can't comment on your tweak to 2.6.24.7 without seeing it as a patch diff. I am no longer associated with MIPS Technologies and no longer have access to my email archives from that period. If I did, I could tell you which LMO kernel version(s) had SMTC working "out of the box". There definitely was at least one, and I commented on it in an email. You might be able to find it in the LMO email archives, but it's possible that I only sent it to a MIPS internal mailing list. There was also a message I wrote that I had *thought* had gone to the LMO mailing list, but may have only been sent to a group of internal MIPS and customer engineers, in which I described the recommended procedure for debugging exactly this canonical problem with porting SMTC. The recommended procedure was, and remains, to isolate clock propagation problems by using command line options "maxtcs=" and "maxvpes=". First, boot your SMTC kernel with maxtcs=1 and maxvpes=1, a virtual uniprocessor. If that doesn't run, you've got some fundamental problem with support for your platform, or someone has really fundamentally broken the SMTC build somewhere. Next, try booting with maxtcs=2 and maxvpes=1, then with no constraint on maxtcs and maxvpes=1. If those fail, your problem is probably in the interrupt mask management algorithms I described. On the other hand, if you boot with maxtcs=2 and maxvpes=2, there will be only one TC per VPE and far less vulnerability to interrupt mask lockup, but you need to have cross-VPE IPI interrupts working. The preferred method of doing cross-VPE IPIs would be to use a physical interrupt input that's instantiated per-VPE and manipulable by software. Malta didn't have one, so there's the historical hack of using MIPS MT instructions to freeze the other VPE and set up a software interrupt using MTTR to the remote Cause register. The PMC-Sierra platforms did, if I recall correctly, have some kind of register that one could write to cause a real cross-VPE hardware interrupt, but I don't recall whether it got used in the SMTC port. Your dump below looks as if it comes from 2 TCs running on 2 VPEs, and that the interrupt mask issues I alluded to earlier are neither relevant nor manifest. It looks instead as if the initialization of "CPU 1" (VPE1/TC1) may not have been done properly. Under normal operation, it would be pretty rare to catch TC 1 in the exception vector dispatch code, so the first hypothesis that comes to mind is that something isn't right in the vector/handler setup, and TC 1 is stuck in an infinite exception loop, unable to handshake with TC 0 and thus locking up the system. But that's just my best guess based on limited data. Regards, Kevin K. On 12/14/10 07:25, Anoop P.A. wrote: >> it ended up being cleaner and more efficient to have *some* hooks in >> platform specific timer code. It was there for Malta in the > kernel.org >> mainline once upon a time, and I *thought* we'd propagated working > code >> for the initial PMC-Sierra 34K-based SoC's at least as far as > [Anoop P.A.] > I was able to boot 2.6.24-7 git sources with a change in cevt-r4k.c ( > c0_compare_int_pending changed as following "return (read_c0_cause()>> > cp0_compare_irq_shift)& (1ul<< CAUSEB_IP)" > >> linux-mips.org, but the source tree has been considerably reorganized > - >> there was a time when some of the hooks were under >> arch/mips/mips-boards/generic, which no longer exists - and I'm not > sure >> where to point you. Git and grep are your friends. > [Anoop P.A.]malta code has been moved to arch/mips/mti-malta/ > Can you recollect the version of l-m-o kernel with a known working SMTC > support ?. > >> The first order of business is to break into that hung timer > calibration >> loop and dump the CP0 registers for the VPE and the TCs, in particular >> checking the interrupt enable mask in Status against the pending >> interrupts in the Cause register. If you're seeing the timer >> interrupt's bit set in Cause, but clear in Status, you need to fix the >> SMTC interrupt mask hook for your platform timer. > [Anoop P.A.] > I tried dumping registers from calibration while loop. > It looks like the timer interrupt bit stay high on both cause and status > register ( in my case timer interrupt is connected to Cascaded CIC > interrupt which is connected to irq -6 ( C_IRQ4)). Detailed log pasted > below > >> check to see if you're building for "tickless" operation. Tickless > ends >> up being really important for SMTC, and I did get it working properly >> back in 2008, but I the SMTC-specific cevt-smtc.c code uses common >> functions in cevt-r4k.c, and I've seen some patches to cevt-r4k.c > going >> by that I rather doubt were ever tested against an SMTC > build/platform. >> There might have been breakage there, and configuring to use a fixed >> interval timer (say, 100Hz) would be a way to test that hypothesis. > [Anoop P.A.] I have tried both tickles and fixed interval timer. > >> Regards, >> >> Kevin K. > > [Anoop P.A.] Thanks much for your and Ralf's detailed response. > [Anoop P.A.] > [ 0.000000] Writing ErrCtl register=00000000 > [ 0.000000] Readback ErrCtl register=00000000 > [ 0.000000] Memory: 254384k/257912k available (3062k kernel code, > 3528k reserved, 648k data, 200k init, 0k highmem) > [ 0.000000] Preemptable hierarchical RCU implementation. > [ 0.000000] NR_IRQS:128 > [ 0.000000] console [ttyS0] enabled > [ 0.000000] Clock rate set to 600000000 > [ 0.000000] Calibrating delay loop... === MIPS MT State Dump === > [ 0.000000] -- Global State -- > [ 0.000000] MVPControl Passed: 00000000 > [ 0.000000] MVPControl Read: 00000000 > [ 0.000000] MVPConf0 : a8008406 > [ 0.000000] -- per-VPE State -- > [ 0.000000] VPE 0 > [ 0.000000] VPEControl : 00000000 > [ 0.000000] VPEConf0 : 800f0003 > [ 0.000000] VPE0.Status : 11004001 > [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 > [ 0.000000] VPE0.Cause : 40804000 > [ 0.000000] VPE0.Config7 : 00010000 > [ 0.000000] VPE 1 > [ 0.000000] VPEControl : 00060000 > [ 0.000000] VPEConf0 : 800f0000 > [ 0.000000] VPE1.Status : 00408305 > [ 0.000000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 > [ 0.000000] VPE1.Cause : 40000200 > [ 0.000000] VPE1.Config7 : 00010000 > [ 0.000000] -- per-TC State -- > [ 0.000000] TC 0 (current TC with VPE EPC above) > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00000000 > [ 0.000000] TCRestart : 8010d860 mips_mt_regdump+0x2f0/0x3c4 > [ 0.000000] TCHalt : 00000000 > [ 0.000000] TCContext : 00000000 > [ 0.000000] TC 1 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00200001 > [ 0.000000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00180000 > [ 0.000000] TC 2 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00400001 > [ 0.000000] TCRestart : 7ffffffc 0x7ffffffc > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00300000 > [ 0.000000] TC 3 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00600001 > [ 0.000000] TCRestart : fff7ffae 0xfff7ffae > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00480000 > [ 0.000000] TC 4 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00800001 > [ 0.000000] TCRestart : f3fff7fe 0xf3fff7fe > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00600000 > [ 0.000000] TC 5 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00a00001 > [ 0.000000] TCRestart : 7ffffbfe 0x7ffffbfe > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00780000 > [ 0.000000] TC 6 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00c00001 > [ 0.000000] TCRestart : ffff7ffe 0xffff7ffe > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00900000 > [ 0.000000] Counter Interrupts taken per CPU (TC) > [ 0.000000] 0: 0 > [ 0.000000] 1: 0 > [ 0.000000] 2: 0 > [ 0.000000] 3: 0 > [ 0.000000] 4: 0 > [ 0.000000] 5: 0 > [ 0.000000] 6: 0 > [ 0.000000] 7: 0 > [ 0.000000] Self-IPI invocations: > [ 0.000000] 0: 0 > [ 0.000000] 1: 0 > [ 0.000000] 2: 0 > [ 0.000000] 3: 0 > [ 0.000000] 4: 0 > [ 0.000000] 5: 0 > [ 0.000000] 6: 0 > [ 0.000000] 7: 0 > [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] 0 Recoveries of "stolen" FPU > [ 0.000000] =========================== > [ 0.000000] In platform cic dispatch cic_mask=0x22000 stat=0x2402000f > pend=0x20000 > [ 0.010000] === MIPS MT State Dump === > [ 0.010000] -- Global State -- > [ 0.010000] MVPControl Passed: 00000000 > [ 0.010000] MVPControl Read: 00000000 > [ 0.010000] MVPConf0 : a8008406 > [ 0.010000] -- per-VPE State -- > [ 0.010000] VPE 0 > [ 0.010000] VPEControl : 00000000 > [ 0.010000] VPEConf0 : 800f0003 > [ 0.010000] VPE0.Status : 18004000 > [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 > [ 0.010000] VPE0.Cause : 40804000 > [ 0.010000] VPE0.Config7 : 00010000 > [ 0.010000] VPE 1 > [ 0.010000] VPEControl : 00060000 > [ 0.010000] VPEConf0 : 800f0000 > [ 0.010000] VPE1.Status : 00408305 > [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 > [ 0.010000] VPE1.Cause : 40000200 > [ 0.010000] VPE1.Config7 : 00010000 > [ 0.010000] -- per-TC State -- > [ 0.010000] TC 0 (current TC with VPE EPC above) > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00000000 > [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 > [ 0.010000] TCHalt : 00000000 > [ 0.010000] TCContext : 00000000 > [ 0.010000] TC 1 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00200001 > [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00180000 > [ 0.010000] TC 2 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00400001 > [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00300000 > [ 0.010000] TC 3 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00600001 > [ 0.010000] TCRestart : fff7ffae 0xfff7ffae > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00480000 > [ 0.010000] TC 4 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00800001 > [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00600000 > [ 0.010000] TC 5 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00a00001 > [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00780000 > [ 0.010000] TC 6 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00c00001 > [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00900000 > [ 0.010000] Counter Interrupts taken per CPU (TC) > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] Self-IPI invocations: > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] 0 Recoveries of "stolen" FPU > [ 0.010000] =========================== > [ 0.010000] === MIPS MT State Dump === > [ 0.010000] -- Global State -- > [ 0.010000] MVPControl Passed: 00000000 > [ 0.010000] MVPControl Read: 00000000 > [ 0.010000] MVPConf0 : a8008406 > [ 0.010000] -- per-VPE State -- > [ 0.010000] VPE 0 > [ 0.010000] VPEControl : 00000000 > [ 0.010000] VPEConf0 : 800f0003 > [ 0.010000] VPE0.Status : 18004000 > [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 > [ 0.010000] VPE0.Cause : 40804000 > [ 0.010000] VPE0.Config7 : 00010000 > [ 0.010000] VPE 1 > [ 0.010000] VPEControl : 00060000 > [ 0.010000] VPEConf0 : 800f0000 > [ 0.010000] VPE1.Status : 00408305 > [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 > [ 0.010000] VPE1.Cause : 40000200 > [ 0.010000] VPE1.Config7 : 00010000 > [ 0.010000] -- per-TC State -- > [ 0.010000] TC 0 (current TC with VPE EPC above) > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00000000 > [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 > [ 0.010000] TCHalt : 00000000 > [ 0.010000] TCContext : 00000000 > [ 0.010000] TC 1 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00200001 > [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00180000 > [ 0.010000] TC 2 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00400001 > [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00300000 > [ 0.010000] TC 3 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00600001 > [ 0.010000] TCRestart : fff7ffae 0xfff7ffae > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00480000 > [ 0.010000] TC 4 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00800001 > [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00600000 > [ 0.010000] TC 5 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00a00001 > [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00780000 > [ 0.010000] TC 6 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00c00001 > [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00900000 > [ 0.010000] Counter Interrupts taken per CPU (TC) > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] Self-IPI invocations: > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] 0 Recoveries of "stolen" FPU > [ 0.010000] =========================== > [ 0.010000] === MIPS MT State Dump === > [ 0.010000] -- Global State -- > [ 0.010000] MVPControl Passed: 00000000 > [ 0.010000] MVPControl Read: 00000000 > [ 0.010000] MVPConf0 : a8008406 > [ 0.010000] -- per-VPE State -- > [ 0.010000] VPE 0 > [ 0.010000] VPEControl : 00000000 > [ 0.010000] VPEConf0 : 800f0003 > [ 0.010000] VPE0.Status : 18004000 > [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 > [ 0.010000] VPE0.Cause : 40804000 > [ 0.010000] VPE0.Config7 : 00010000 > [ 0.010000] VPE 1 > [ 0.010000] VPEControl : 00060000 > [ 0.010000] VPEConf0 : 800f0000 > [ 0.010000] VPE1.Status : 00408305 > [ 0.010000] VPE1.EPC : 801024e0 except_vec_vi+0x0/0x84 > [ 0.010000] VPE1.Cause : 40000200 > [ 0.010000] VPE1.Config7 : 00010000 > [ 0.010000] -- per-TC State -- > [ 0.010000] TC 0 (current TC with VPE EPC above) > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00000000 > [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 > [ 0.010000] TCHalt : 00000000 > [ 0.010000] TCContext : 00000000 > [ 0.010000] TC 1 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00200001 > [ 0.010000] TCRestart : 80104b64 copy_thread+0x2ac/0x2b4 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00180000 > [ 0.010000] TC 2 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00400001 > [ 0.010000] TCRestart : 7ffffffc 0x7ffffffc > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00300000 > [ 0.010000] TC 3 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00600001 > [ 0.010000] TCRestart : fff7ffae 0xfff7ffae > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00480000 > [ 0.010000] TC 4 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00800001 > [ 0.010000] TCRestart : f3fff7fe 0xf3fff7fe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00600000 > [ 0.010000] TC 5 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00a00001 > [ 0.010000] TCRestart : 7ffffbfe 0x7ffffbfe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00780000 > [ 0.010000] TC 6 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00c00001 > [ 0.010000] TCRestart : ffff7ffe 0xffff7ffe > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00900000 > [ 0.010000] Counter Interrupts taken per CPU (TC) > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] Self-IPI invocations: > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] 0 Recoveries of "stolen" FPU > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-14 18:32 ` Kevin D. Kissell @ 2010-12-14 18:50 ` Ralf Baechle 2010-12-15 19:18 ` Anoop P A 1 sibling, 0 replies; 68+ messages in thread From: Ralf Baechle @ 2010-12-14 18:50 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Anoop P.A., linux-mips On Tue, Dec 14, 2010 at 10:32:57AM -0800, Kevin D. Kissell wrote: > I am no longer associated with MIPS Technologies and no longer have > access to my email archives from that period. If I did, I could tell you > which LMO kernel version(s) had SMTC working "out of the box". There > definitely was at least one, and I commented on it in an email. You > might be able to find it in the LMO email archives, but it's possible that > I only sent it to a MIPS internal mailing list. > > There was also a message I wrote that I had *thought* had gone to > the LMO mailing list, but may have only been sent to a group of internal > MIPS and customer engineers, in which I described the recommended > procedure for debugging exactly this canonical problem with porting > SMTC. git bisect to the rescue :) It's time consuming with a slow machine but perfectly doable. Go back, find some antique kernel version with functioning SMTC and take it from there. Ralf ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-14 18:32 ` Kevin D. Kissell 2010-12-14 18:50 ` Ralf Baechle @ 2010-12-15 19:18 ` Anoop P A 2010-12-15 19:58 ` Kevin D. Kissell 1 sibling, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-15 19:18 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Anoop P.A., linux-mips On Tue, 2010-12-14 at 10:32 -0800, Kevin D. Kissell wrote: > Between your mailer and mine (Thunderbird 3.1 on Ubuntu), the quoting > has become something of a dogs breakfast, so let me just lay things out > here as best I can. I am sorry for that. With evolution it will be better I hope. > > I can't comment on your tweak to 2.6.24.7 without seeing it as a patch > diff. http://patchwork.linux-mips.org/patch/804/ I was speaking about this patch. Since my timer is connected through a cascaded CIC , It is required to check TI bit of cause register in order to ensure a timer interrupt. With above mentioned patch I was able to boot a 2.6.24-stable SMTC kernel. ( Not tested fully though ) > The recommended procedure was, and remains, to isolate clock > propagation problems by using command line options "maxtcs=" > and "maxvpes=". > > First, boot your SMTC kernel with maxtcs=1 and maxvpes=1, > a virtual uniprocessor. If that doesn't run, you've got some fundamental > problem with support for your platform, or someone has really fundamentally > broken the SMTC build somewhere. Next, try booting with maxtcs=2 > and maxvpes=1, then with no constraint on maxtcs and maxvpes=1. > If those fail, your problem is probably in the interrupt mask > management algorithms I described Even with command line maxtcs=1 and maxvpes=1 I am seeing same hung. The register dump is copied below. > Your dump below looks as if it comes from 2 TCs running on > 2 VPEs, and that the interrupt mask issues I alluded to earlier > are neither relevant nor manifest. It looks instead as if the > initialization of "CPU 1" (VPE1/TC1) may not have been done > properly. Under normal operation, it would be pretty rare to > catch TC 1 in the exception vector dispatch code, so the first > hypothesis that comes to mind is that something isn't right in > the vector/handler setup, and TC 1 is stuck in an infinite exception > loop, unable to handshake with TC 0 and thus locking up the > system. But that's just my best guess based on limited data. > > Regards, > > Kevin K. > I have tested few stable tags in git and isolated the code brake. 2.6.24-stable + patch[1] = SMTC boot success 2.6.29-stable + patch[1] = SMTC boot success 2.6.31-stable + patch[1] = SMTC boot success 2.6.32-stable + patch[1] = SMTC boot success 2.6.33-stable = SMTC boot failed 2.6.35-stable = SMTC boot failed So it looks like SMTC support got broke between 2.6.32 and 2.6.33 . Thanks and Regards, Anoop patch[1] : http://patchwork.linux-mips.org/patch/804/ #############################Log########################### 0.000000] Calibrating delay loop... === MIPS MT State Dump === [ 0.000000] -- Global State -- [ 0.000000] MVPControl Passed: 00000000 [ 0.000000] MVPControl Read: 00000000 [ 0.000000] MVPConf0 : a8008406 [ 0.000000] -- per-VPE State -- [ 0.000000] VPE 0 [ 0.000000] VPEControl : 00000000 [ 0.000000] VPEConf0 : 800f0003 [ 0.000000] VPE0.Status : 11004001 [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 [ 0.000000] VPE0.Cause : 50804000 [ 0.000000] VPE0.Config7 : 00010000 [ 0.000000] VPE 1 [ 0.000000] VPEControl : 00060000 [ 0.000000] VPEConf0 : 800f0000 [ 0.000000] VPE1.Status : 00408305 [ 0.000000] VPE1.EPC : 80100380 name_to_dev_t+0x50/0x430 [ 0.000000] VPE1.Cause : 50000200 [ 0.000000] VPE1.Config7 : 00010000 [ 0.000000] -- per-TC State -- [ 0.000000] TC 0 (current TC with VPE EPC above) [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00000000 [ 0.000000] TCRestart : 8010d860 mips_mt_regdump+0x2f0/0x3c4 [ 0.000000] TCHalt : 00000000 [ 0.000000] TCContext : 00000000 [ 0.000000] TC 1 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00200001 [ 0.000000] TCRestart : 8f800020 0x8f800020 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00140000 [ 0.000000] TC 2 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00400001 [ 0.000000] TCRestart : 8f800020 0x8f800020 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00280000 [ 0.000000] TC 3 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00600001 [ 0.000000] TCRestart : 8f800020 0x8f800020 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 003c0000 [ 0.000000] TC 4 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00800001 [ 0.000000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00500000 [ 0.000000] TC 5 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00a00001 [ 0.000000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00640000 [ 0.000000] TC 6 [ 0.000000] TCStatus : 00000000 [ 0.000000] TCBind : 00c00001 [ 0.000000] TCRestart : 80268e00 aes_encrypt+0x10e4/0x164c [ 0.000000] TCHalt : 00000001 [ 0.000000] TCContext : 00780000 [ 0.000000] Counter Interrupts taken per CPU (TC) [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] 2: 0 [ 0.000000] 3: 0 [ 0.000000] 4: 0 [ 0.000000] 5: 0 [ 0.000000] 6: 0 [ 0.000000] 7: 0 [ 0.000000] Self-IPI invocations: [ 0.000000] 0: 0 [ 0.000000] 1: 0 [ 0.000000] 2: 0 [ 0.000000] 3: 0 [ 0.000000] 4: 0 [ 0.000000] 5: 0 [ 0.000000] 6: 0 [ 0.000000] 7: 0 [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.000000] 0 Recoveries of "stolen" FPU [ 0.000000] =========================== [ 0.000000] In platform cic dispatch cic_mask=0x22000 stat=0x2402000f pend=0x20000 [ 0.010000] === MIPS MT State Dump === [ 0.010000] -- Global State -- [ 0.010000] MVPControl Passed: 00000000 [ 0.010000] MVPControl Read: 00000000 [ 0.010000] MVPConf0 : a8008406 [ 0.010000] -- per-VPE State -- [ 0.010000] VPE 0 [ 0.010000] VPEControl : 00000000 [ 0.010000] VPEConf0 : 800f0003 [ 0.010000] VPE0.Status : 18004000 [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 [ 0.010000] VPE0.Cause : 40804000 [ 0.010000] VPE0.Config7 : 00010000 [ 0.010000] VPE 1 [ 0.010000] VPEControl : 00060000 [ 0.010000] VPEConf0 : 800f0000 [ 0.010000] VPE1.Status : 00408305 [ 0.010000] VPE1.EPC : 80100380 name_to_dev_t+0x50/0x430 [ 0.010000] VPE1.Cause : 50000200 [ 0.010000] VPE1.Config7 : 00010000 [ 0.010000] -- per-TC State -- [ 0.010000] TC 0 (current TC with VPE EPC above) [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00000000 [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 [ 0.010000] TCHalt : 00000000 [ 0.010000] TCContext : 00000000 [ 0.010000] TC 1 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00200001 [ 0.010000] TCRestart : 8f800020 0x8f800020 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00140000 [ 0.010000] TC 2 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00400001 [ 0.010000] TCRestart : 8f800020 0x8f800020 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00280000 [ 0.010000] TC 3 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00600001 [ 0.010000] TCRestart : 8f800020 0x8f800020 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 003c0000 [ 0.010000] TC 4 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00800001 [ 0.010000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00500000 [ 0.010000] TC 5 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00a00001 [ 0.010000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00640000 [ 0.010000] TC 6 [ 0.010000] TCStatus : 00000000 [ 0.010000] TCBind : 00c00001 [ 0.010000] TCRestart : 80268e00 aes_encrypt+0x10e4/0x164c [ 0.010000] TCHalt : 00000001 [ 0.010000] TCContext : 00780000 [ 0.010000] Counter Interrupts taken per CPU (TC) [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] Self-IPI invocations: [ 0.010000] 0: 0 [ 0.010000] 1: 0 [ 0.010000] 2: 0 [ 0.010000] 3: 0 [ 0.010000] 4: 0 [ 0.010000] 5: 0 [ 0.010000] 6: 0 [ 0.010000] 7: 0 [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 [ 0.010000] 0 Recoveries of "stolen" FPU [ 0.010000] =========================== ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-15 19:18 ` Anoop P A @ 2010-12-15 19:58 ` Kevin D. Kissell 2010-12-16 13:03 ` Anoop P A 0 siblings, 1 reply; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-15 19:58 UTC (permalink / raw) To: Anoop P A; +Cc: Anoop P.A., linux-mips On 12/15/10 11:18, Anoop P A wrote: > On Tue, 2010-12-14 at 10:32 -0800, Kevin D. Kissell wrote: > >> I can't comment on your tweak to 2.6.24.7 without seeing it as a patch >> diff. > http://patchwork.linux-mips.org/patch/804/ I was speaking about this > patch. Since my timer is connected through a cascaded CIC , It is > required to check TI bit of cause register in order to ensure a timer > interrupt. With above mentioned patch I was able to boot a 2.6.24-stable > SMTC kernel. ( Not tested fully though ) OK, yes, of course, you'd need that patch. >> The recommended procedure was, and remains, to isolate clock >> propagation problems by using command line options "maxtcs=" >> and "maxvpes=". >> >> First, boot your SMTC kernel with maxtcs=1 and maxvpes=1, >> a virtual uniprocessor. If that doesn't run, you've got some fundamental >> problem with support for your platform, or someone has really fundamentally >> broken the SMTC build somewhere. Next, try booting with maxtcs=2 >> and maxvpes=1, then with no constraint on maxtcs and maxvpes=1. >> If those fail, your problem is probably in the interrupt mask >> management algorithms I described > Even with command line maxtcs=1 and maxvpes=1 I am seeing same hung. The > register dump is copied below. I guess what jumps out at me is that VPE0.EPC doesn't look to have changed since the very initial boot vector, as if we'd never successfully taken an exception or interrupt of any kind, prior to the NMI (I'm assuming you're getting that MT state dump by breaking in with an NMI). I'm puzzled that TC0.TCStatus is being reported as 0, when it should have a bunch of bits in common with VPE0.Status. And I'm particularly intrigued by the fact that you seem to have an interrupt bit set in Cause which is enabled in Status, with IE set and EXL/ERL clear, yet you don't seem to be getting interrupts. Do you have access to some kind of EJTAG probe for your system? > I have tested few stable tags in git and isolated the code brake. > > 2.6.24-stable + patch[1] = SMTC boot success > 2.6.29-stable + patch[1] = SMTC boot success > 2.6.31-stable + patch[1] = SMTC boot success > 2.6.32-stable + patch[1] = SMTC boot success > 2.6.33-stable = SMTC boot failed > 2.6.35-stable = SMTC boot failed > > So it looks like SMTC support got broke between 2.6.32 and 2.6.33 . That's a pretty good job of isolating the problem, and the fact that it happens even with no TC or VPE concurrency means it's not a failure of the SMTC logic per se, but that someone changed some code that's common to SMTC and "normal"/SMP operation in a way that breaks the more constrained assumptions of SMTC. > Thanks and Regards, > Anoop > > patch[1] : http://patchwork.linux-mips.org/patch/804/ > > > #############################Log########################### > 0.000000] Calibrating delay loop... === MIPS MT State Dump === > [ 0.000000] -- Global State -- > [ 0.000000] MVPControl Passed: 00000000 > [ 0.000000] MVPControl Read: 00000000 > [ 0.000000] MVPConf0 : a8008406 > [ 0.000000] -- per-VPE State -- > [ 0.000000] VPE 0 > [ 0.000000] VPEControl : 00000000 > [ 0.000000] VPEConf0 : 800f0003 > [ 0.000000] VPE0.Status : 11004001 > [ 0.000000] VPE0.EPC : 80100000 _stext+0x0/0x10 > [ 0.000000] VPE0.Cause : 50804000 > [ 0.000000] VPE0.Config7 : 00010000 > [ 0.000000] VPE 1 > [ 0.000000] VPEControl : 00060000 > [ 0.000000] VPEConf0 : 800f0000 > [ 0.000000] VPE1.Status : 00408305 > [ 0.000000] VPE1.EPC : 80100380 name_to_dev_t+0x50/0x430 > [ 0.000000] VPE1.Cause : 50000200 > [ 0.000000] VPE1.Config7 : 00010000 > [ 0.000000] -- per-TC State -- > [ 0.000000] TC 0 (current TC with VPE EPC above) > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00000000 > [ 0.000000] TCRestart : 8010d860 mips_mt_regdump+0x2f0/0x3c4 > [ 0.000000] TCHalt : 00000000 > [ 0.000000] TCContext : 00000000 > [ 0.000000] TC 1 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00200001 > [ 0.000000] TCRestart : 8f800020 0x8f800020 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00140000 > [ 0.000000] TC 2 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00400001 > [ 0.000000] TCRestart : 8f800020 0x8f800020 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00280000 > [ 0.000000] TC 3 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00600001 > [ 0.000000] TCRestart : 8f800020 0x8f800020 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 003c0000 > [ 0.000000] TC 4 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00800001 > [ 0.000000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00500000 > [ 0.000000] TC 5 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00a00001 > [ 0.000000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00640000 > [ 0.000000] TC 6 > [ 0.000000] TCStatus : 00000000 > [ 0.000000] TCBind : 00c00001 > [ 0.000000] TCRestart : 80268e00 aes_encrypt+0x10e4/0x164c > [ 0.000000] TCHalt : 00000001 > [ 0.000000] TCContext : 00780000 > [ 0.000000] Counter Interrupts taken per CPU (TC) > [ 0.000000] 0: 0 > [ 0.000000] 1: 0 > [ 0.000000] 2: 0 > [ 0.000000] 3: 0 > [ 0.000000] 4: 0 > [ 0.000000] 5: 0 > [ 0.000000] 6: 0 > [ 0.000000] 7: 0 > [ 0.000000] Self-IPI invocations: > [ 0.000000] 0: 0 > [ 0.000000] 1: 0 > [ 0.000000] 2: 0 > [ 0.000000] 3: 0 > [ 0.000000] 4: 0 > [ 0.000000] 5: 0 > [ 0.000000] 6: 0 > [ 0.000000] 7: 0 > [ 0.000000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > [ 0.000000] 0 Recoveries of "stolen" FPU > [ 0.000000] =========================== > [ 0.000000] In platform cic dispatch cic_mask=0x22000 stat=0x2402000f > pend=0x20000 > [ 0.010000] === MIPS MT State Dump === > [ 0.010000] -- Global State -- > [ 0.010000] MVPControl Passed: 00000000 > [ 0.010000] MVPControl Read: 00000000 > [ 0.010000] MVPConf0 : a8008406 > [ 0.010000] -- per-VPE State -- > [ 0.010000] VPE 0 > [ 0.010000] VPEControl : 00000000 > [ 0.010000] VPEConf0 : 800f0003 > [ 0.010000] VPE0.Status : 18004000 > [ 0.010000] VPE0.EPC : 8010d900 mips_mt_regdump+0x390/0x3c4 > [ 0.010000] VPE0.Cause : 40804000 > [ 0.010000] VPE0.Config7 : 00010000 > [ 0.010000] VPE 1 > [ 0.010000] VPEControl : 00060000 > [ 0.010000] VPEConf0 : 800f0000 > [ 0.010000] VPE1.Status : 00408305 > [ 0.010000] VPE1.EPC : 80100380 name_to_dev_t+0x50/0x430 > [ 0.010000] VPE1.Cause : 50000200 > [ 0.010000] VPE1.Config7 : 00010000 > [ 0.010000] -- per-TC State -- > [ 0.010000] TC 0 (current TC with VPE EPC above) > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00000000 > [ 0.010000] TCRestart : 803f791c printk+0xc/0x30 > [ 0.010000] TCHalt : 00000000 > [ 0.010000] TCContext : 00000000 > [ 0.010000] TC 1 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00200001 > [ 0.010000] TCRestart : 8f800020 0x8f800020 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00140000 > [ 0.010000] TC 2 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00400001 > [ 0.010000] TCRestart : 8f800020 0x8f800020 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00280000 > [ 0.010000] TC 3 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00600001 > [ 0.010000] TCRestart : 8f800020 0x8f800020 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 003c0000 > [ 0.010000] TC 4 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00800001 > [ 0.010000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00500000 > [ 0.010000] TC 5 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00a00001 > [ 0.010000] TCRestart : 80100380 name_to_dev_t+0x50/0x430 > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00640000 > [ 0.010000] TC 6 > [ 0.010000] TCStatus : 00000000 > [ 0.010000] TCBind : 00c00001 > [ 0.010000] TCRestart : 80268e00 aes_encrypt+0x10e4/0x164c > [ 0.010000] TCHalt : 00000001 > [ 0.010000] TCContext : 00780000 > [ 0.010000] Counter Interrupts taken per CPU (TC) > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] Self-IPI invocations: > [ 0.010000] 0: 0 > [ 0.010000] 1: 0 > [ 0.010000] 2: 0 > [ 0.010000] 3: 0 > [ 0.010000] 4: 0 > [ 0.010000] 5: 0 > [ 0.010000] 6: 0 > [ 0.010000] 7: 0 > [ 0.010000] IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > [ 0.010000] 0 Recoveries of "stolen" FPU > [ 0.010000] =========================== > > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-15 19:58 ` Kevin D. Kissell @ 2010-12-16 13:03 ` Anoop P A 2010-12-16 18:43 ` Kevin D. Kissell 0 siblings, 1 reply; 68+ messages in thread From: Anoop P A @ 2010-12-16 13:03 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Anoop P.A., linux-mips On Wed, 2010-12-15 at 11:58 -0800, Kevin D. Kissell wrote: > On 12/15/10 11:18, Anoop P A wrote: > >> management algorithms I described > > Even with command line maxtcs=1 and maxvpes=1 I am seeing same hung. The > > register dump is copied below. > I guess what jumps out at me is that VPE0.EPC doesn't look to have > changed since the very initial boot vector, as if we'd never successfully > taken an exception or interrupt of any kind, prior to the NMI (I'm assuming > you're getting that MT state dump by breaking in with an NMI). > I'm puzzled that TC0.TCStatus is being reported as 0, when it should > have a bunch of bits in common with VPE0.Status. And I'm particularly > intrigued by the fact that you seem to have an interrupt bit set in Cause > which is enabled in Status, with IE set and EXL/ERL clear, yet you don't > seem to be getting interrupts. > > Do you have access to some kind of EJTAG probe for your system? Unfortunately I don't have access to a working EJTAG at the moment. > > > I have tested few stable tags in git and isolated the code brake. > > > > 2.6.24-stable + patch[1] = SMTC boot success > > 2.6.29-stable + patch[1] = SMTC boot success > > 2.6.31-stable + patch[1] = SMTC boot success > > 2.6.32-stable + patch[1] = SMTC boot success > > 2.6.33-stable = SMTC boot failed > > 2.6.35-stable = SMTC boot failed > > > > So it looks like SMTC support got broke between 2.6.32 and 2.6.33 . > That's a pretty good job of isolating the problem, and the fact > that it happens even with no TC or VPE concurrency means it's > not a failure of the SMTC logic per se, but that someone changed > some code that's common to SMTC and "normal"/SMP operation > in a way that breaks the more constrained assumptions of SMTC. > I have tried digging diff between 2.6.32 and 2.6.33 but I couldn't spot any likely causes. I forgot to mention that I can boot newer kernels both in VSMP and UP mode. The other thing I have tried is booting kernel with pre-set lpj ( Just to test how far I can go), which lead me to a dsp exception (spurious ?) Let me know if you have any thoughts . Thanks, Anoop ################# log ############# Linux version 2.6.33.7-pmc (paanoop1@paanoop1-desktop) (gcc version 4.5.1 (GCC) ) #27 SMP PREEMPT Thu Dec 16 17:49:46 IST 2010 DSPRAM0: PA=1c100000,Size=00008000,enabled UART clock set to 50000000 CPU revision is: 00019548 (MIPS 34Kc) Determined physical RAM map: memory: 00001000 @ 00000000 (reserved) memory: 000ff000 @ 00001000 (usable) memory: 00271000 @ 00100000 (reserved) memory: 0fc5a200 @ 00371000 (usable) Wasting 32 bytes for tracking 1 unused pages Zone PFN ranges: Normal 0x00000000 -> 0x0000ffcb Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x00000000 -> 0x0000ffcb 6 available secondary CPU TC(s) PERCPU: Embedded 7 pages/cpu @81203000 s4896 r8192 d15584 u65536 pcpu-alloc: s4896 r8192 d15584 u65536 alloc=16*4096 pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64971 Kernel command line: console=ttyS0,57600 lpj=796672 PID hash table entries: 1024 (order: 0, 4096 bytes) Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. Primary data cache 64kB, 4-way, PIPT, no aliases, linesize 32 bytes Writing ErrCtl register=00000000 Readback ErrCtl register=00000000 Memory: 255548k/259428k available (1861k kernel code, 3504k reserved, 400k data, 156k init, 0k highmem) Hierarchical RCU implementation. NR_IRQS:128 Clock rate set to 600000000 console [ttyS0] enabled Calibrating delay loop (skipped) preset value.. 398.33 BogoMIPS (lpj=796672) Mount-cache hash table entries: 512 Cpu 0 $ 0 : 00000000 10102000 00000010 00000003 $ 4 : 00000003 00000000 00000000 8f82f758 $ 8 : 00000000 00000000 00000000 00000000 $12 : 00000000 00000007 8f82301c 00000000 $16 : 8f82f758 00800b00 8035d3c0 8f830000 $20 : 80329df8 00000000 8035d3c0 80360000 $24 : 00000000 00000001 $28 : 80328000 80329ce0 8f82f868 8010d018 Hi : 0000004c Lo : 3831f4b4 epc : 8010d054 copy_thread+0x88/0x348 Not tainted ra : 8010d018 copy_thread+0x4c/0x348 Status: 10102000 KERNEL Cause : 50804068 PrId : 00019548 (MIPS 34Kc) Kernel panic - not syncing: Unexpected DSP exception ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: SMTC support status in latest git head. 2010-12-16 13:03 ` Anoop P A @ 2010-12-16 18:43 ` Kevin D. Kissell 0 siblings, 0 replies; 68+ messages in thread From: Kevin D. Kissell @ 2010-12-16 18:43 UTC (permalink / raw) To: Anoop P A; +Cc: Anoop P.A., linux-mips Getting back to my previous comment, the value reported for TC0's TCStatus register in the MT register dump can't be right. There are bits that are literally the same flip-flops between TCStatus and the containing VPE's Status register, and those bits are turning up different. If the reporting is wrong, then one of the underlying assumptions of the dump code must have been broken. Taking a quick look at it - which is all the time I have for it today - I note with alarm that the TCStatus value reported for the TC currently executing comes from the "flags" variable used in the local_irq_save(flags) statement at the beginning of the dump code. That historically worked, because local_irq_save(x) propagated not only the interrupt enable bit (bit 0) in x, but the entire value of Status - or TCStatus in the case of SMTC. It certainly looks as if that's no longer true. I'm pretty sure that the dump function isn't the only place where the knowledge of local_irq_save()'s implementation was exploited by SMTC code. So you look for changes to the local_irq_save() macro definitions between 2.6.32 and 2.6.33. The fact that you're blowing up on a DSP after you force an exit from the timer calibration loop might also be attributable to TCStatus is getting trashed, accidentally clearing access rights to the DSP ASE state. Honestly, just how many lines changed under arch/mips (and include/asm-mips, if it was still outside arch/mips) between 2.6.32 and 2.6.33? There simply can't be that many to review. Regards, Kevin K. On 12/16/10 05:03, Anoop P A wrote: > On Wed, 2010-12-15 at 11:58 -0800, Kevin D. Kissell wrote: >> On 12/15/10 11:18, Anoop P A wrote: >>>> management algorithms I described >>> Even with command line maxtcs=1 and maxvpes=1 I am seeing same hung. The >>> register dump is copied below. >> I guess what jumps out at me is that VPE0.EPC doesn't look to have >> changed since the very initial boot vector, as if we'd never successfully >> taken an exception or interrupt of any kind, prior to the NMI (I'm assuming >> you're getting that MT state dump by breaking in with an NMI). >> I'm puzzled that TC0.TCStatus is being reported as 0, when it should >> have a bunch of bits in common with VPE0.Status. And I'm particularly >> intrigued by the fact that you seem to have an interrupt bit set in Cause >> which is enabled in Status, with IE set and EXL/ERL clear, yet you don't >> seem to be getting interrupts. >> >> Do you have access to some kind of EJTAG probe for your system? > Unfortunately I don't have access to a working EJTAG at the moment. > >>> I have tested few stable tags in git and isolated the code brake. >>> >>> 2.6.24-stable + patch[1] = SMTC boot success >>> 2.6.29-stable + patch[1] = SMTC boot success >>> 2.6.31-stable + patch[1] = SMTC boot success >>> 2.6.32-stable + patch[1] = SMTC boot success >>> 2.6.33-stable = SMTC boot failed >>> 2.6.35-stable = SMTC boot failed >>> >>> So it looks like SMTC support got broke between 2.6.32 and 2.6.33 . >> That's a pretty good job of isolating the problem, and the fact >> that it happens even with no TC or VPE concurrency means it's >> not a failure of the SMTC logic per se, but that someone changed >> some code that's common to SMTC and "normal"/SMP operation >> in a way that breaks the more constrained assumptions of SMTC. >> > I have tried digging diff between 2.6.32 and 2.6.33 but I couldn't spot > any likely causes. > > I forgot to mention that I can boot newer kernels both in VSMP and UP > mode. > > The other thing I have tried is booting kernel with pre-set lpj ( Just > to test how far I can go), which lead me to a dsp exception (spurious ?) > > Let me know if you have any thoughts . > > Thanks, > Anoop > > ################# log ############# > > Linux version 2.6.33.7-pmc (paanoop1@paanoop1-desktop) (gcc version > 4.5.1 (GCC) ) #27 SMP PREEMPT Thu Dec 16 17:49:46 IST 2010 > DSPRAM0: PA=1c100000,Size=00008000,enabled > UART clock set to 50000000 > CPU revision is: 00019548 (MIPS 34Kc) > Determined physical RAM map: > memory: 00001000 @ 00000000 (reserved) > memory: 000ff000 @ 00001000 (usable) > memory: 00271000 @ 00100000 (reserved) > memory: 0fc5a200 @ 00371000 (usable) > Wasting 32 bytes for tracking 1 unused pages > Zone PFN ranges: > Normal 0x00000000 -> 0x0000ffcb > Movable zone start PFN for each node > early_node_map[1] active PFN ranges > 0: 0x00000000 -> 0x0000ffcb > 6 available secondary CPU TC(s) > PERCPU: Embedded 7 pages/cpu @81203000 s4896 r8192 d15584 u65536 > pcpu-alloc: s4896 r8192 d15584 u65536 alloc=16*4096 > pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 > Built 1 zonelists in Zone order, mobility grouping on. Total pages: > 64971 > Kernel command line: console=ttyS0,57600 lpj=796672 > PID hash table entries: 1024 (order: 0, 4096 bytes) > Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) > Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) > Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. > Primary data cache 64kB, 4-way, PIPT, no aliases, linesize 32 bytes > Writing ErrCtl register=00000000 > Readback ErrCtl register=00000000 > Memory: 255548k/259428k available (1861k kernel code, 3504k reserved, > 400k data, 156k init, 0k highmem) > Hierarchical RCU implementation. > NR_IRQS:128 > Clock rate set to 600000000 > console [ttyS0] enabled > Calibrating delay loop (skipped) preset value.. 398.33 BogoMIPS > (lpj=796672) > Mount-cache hash table entries: 512 > Cpu 0 > $ 0 : 00000000 10102000 00000010 00000003 > $ 4 : 00000003 00000000 00000000 8f82f758 > $ 8 : 00000000 00000000 00000000 00000000 > $12 : 00000000 00000007 8f82301c 00000000 > $16 : 8f82f758 00800b00 8035d3c0 8f830000 > $20 : 80329df8 00000000 8035d3c0 80360000 > $24 : 00000000 00000001 > $28 : 80328000 80329ce0 8f82f868 8010d018 > Hi : 0000004c > Lo : 3831f4b4 > epc : 8010d054 copy_thread+0x88/0x348 > Not tainted > ra : 8010d018 copy_thread+0x4c/0x348 > Status: 10102000 KERNEL > Cause : 50804068 > PrId : 00019548 (MIPS 34Kc) > Kernel panic - not syncing: Unexpected DSP exception > > ^ permalink raw reply [flat|nested] 68+ messages in thread
end of thread, other threads:[~2011-01-13 7:53 UTC | newest]
Thread overview: 68+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-14 21:27 SMTC support status in latest git head STUART VENTERS
2010-12-14 21:27 ` STUART VENTERS
2010-12-14 23:01 ` Kevin D. Kissell
-- strict thread matches above, loose matches on Subject: below --
2010-12-16 15:37 STUART VENTERS
2010-12-16 15:37 ` STUART VENTERS
[not found] ` <4D0A677C.6040104@paralogos.com>
2010-12-16 19:58 ` Kevin D. Kissell
2010-12-17 21:35 ` Kevin D. Kissell
2010-12-20 10:44 ` Anoop P A
[not found] ` <4D10F7A9.1020306@paralogos.com>
2010-12-21 20:06 ` Anoop P.A.
2010-12-21 20:06 ` Anoop P.A.
2010-12-21 20:29 ` Anoop P.A.
2010-12-21 20:29 ` Anoop P.A.
2010-12-22 10:27 ` Kevin D. Kissell
2010-12-22 11:35 ` Anoop P A
2010-12-22 11:37 ` Kevin D. Kissell
2010-12-22 11:51 ` Anoop P A
2010-12-22 13:03 ` Kevin D. Kissell
2010-12-22 16:34 ` STUART VENTERS
2010-12-22 16:34 ` STUART VENTERS
2010-12-23 21:09 ` STUART VENTERS
2010-12-23 21:09 ` STUART VENTERS
2010-12-24 12:32 ` Kevin D. Kissell
2010-12-24 14:39 ` Anoop P A
2010-12-24 14:53 ` Kevin D. Kissell
2010-12-24 16:02 ` Anoop P A
2010-12-24 23:34 ` Kevin D. Kissell
2010-12-25 7:32 ` Anoop P A
2010-12-25 15:17 ` Kevin D. Kissell
2010-12-27 15:49 ` STUART VENTERS
2010-12-27 15:49 ` STUART VENTERS
2010-12-27 17:19 ` Anoop P A
2010-12-28 8:19 ` Anoop P A
2010-12-28 8:43 ` Kevin D. Kissell
2010-12-31 12:27 ` Anoop P A
2011-01-01 8:42 ` Kevin D. Kissell
2011-01-03 15:12 ` Anoop P A
2011-01-03 16:14 ` Kevin D. Kissell
2011-01-03 19:20 ` Anoop P A
2011-01-04 8:17 ` Kevin D. Kissell
2011-01-04 13:02 ` Anoop P A
2011-01-04 14:37 ` Anoop P A
2011-01-04 17:21 ` Kevin D. Kissell
2011-01-04 17:54 ` Anoop P A
2011-01-04 18:33 ` Kevin D. Kissell
2011-01-05 13:11 ` Anoop P A
2011-01-05 19:23 ` Kevin D. Kissell
2011-01-06 20:23 ` Anoop P A
2011-01-06 23:31 ` Kevin D. Kissell
2011-01-07 7:56 ` Anoop P A
2011-01-07 18:46 ` Kevin D. Kissell
2011-01-08 19:33 ` Anoop P A
2011-01-10 19:30 ` Kevin D. Kissell
2011-01-11 4:05 ` Anoop P A
2011-01-13 7:53 ` Kevin D. Kissell
2011-01-04 17:40 ` Kevin D. Kissell
2011-01-05 13:09 ` Anoop P A
2010-12-08 13:48 Anoop P.A.
2010-12-08 13:48 ` Anoop P.A.
2010-12-09 17:07 ` Ralf Baechle
2010-12-09 18:52 ` Kevin D. Kissell
2010-12-14 15:25 ` Anoop P.A.
2010-12-14 15:25 ` Anoop P.A.
2010-12-14 18:32 ` Kevin D. Kissell
2010-12-14 18:50 ` Ralf Baechle
2010-12-15 19:18 ` Anoop P A
2010-12-15 19:58 ` Kevin D. Kissell
2010-12-16 13:03 ` Anoop P A
2010-12-16 18:43 ` Kevin D. Kissell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.