From: "Kevin D. Kissell" <kevink@paralogos.com>
To: Anoop P A <anoop.pa@gmail.com>
Cc: STUART VENTERS <stuart.venters@adtran.com>,
"Anoop P.A." <Anoop_P.A@pmc-sierra.com>,
linux-mips@linux-mips.org
Subject: Re: SMTC support status in latest git head.
Date: Mon, 03 Jan 2011 08:14:11 -0800 [thread overview]
Message-ID: <4D21F5D3.50604@paralogos.com> (raw)
In-Reply-To: <1294067561.27661.293.camel@paanoop1-desktop>
The very first SMTC implementations didn't support full kernel-mode
preemption, which anyway wasn't a priority, given the hardware event
response support in MIPS MT. I believe it was later made compatible,
but it was never extensively exercised. Since SMTC has fingers in some
pretty low-level atomicity mechanisms, if a new, parallel set was
implemented for RCU, I can easily imagine that nobody has yet
implemented SMTC-ified variants of that set.
Your last statement isn't very clear, though. Are you saying that if
you configure for no forced preemption and with TREE_CPU, the 2.6.37
kernel boots all the way up, or that it simply hangs later? What's the
last rev kernel that actually boots all the way up?
Regards,
Kevin K.
On 1/3/2011 7:12 AM, Anoop P A wrote:
> Hi ,
>
> Following patch restricts TREE_CPU RCU implementation only for !PREEMPT
> SMP kernel.
> http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366
>
> CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel
> ( which will be only available RCU implementation for SMTC kernel from
> 2.6.37 onwards) .
>
> With no forced preemption and selecting TREE_CPU I am able to boot
> further to the hang that I have reported.
>
> Thanks
> Anoop
>
> On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote:
>> At this point the logical thing to do would seem to look at your kernel
>> image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0
>> shows the last exception to have been taken. That's a critical SMTC
>> routine that gets called whenever an xxx_irq_restore() enables
>> interrupts, so that virtual per-TC IPI interrupts that were posted while
>> the TC had interrupts disabled can be handled deterministically. As I
>> mentioned in an earlier message, there was some cleanup work from David
>> Howell that changed a number of irq management-related function names
>> and prototypes across all architectures, which went into linux-mips.org
>> at very roughly the time of the breakage. The SMTC overlay over the irq
>> implementation has been pretty robust, but it's written in a perhaps
>> doomed attempt to be both efficient and using a maximum amount of common
>> code with the general case. A mechanical or semi-mechanical change
>> could conceivably have broken things.
>>
>> Regards,
>>
>> Kevin K.
>>
>>
>> On 12/31/2010 4:27 AM, Anoop P A wrote:
>>> Hi ,
>>>
>>> Kernel hangs on stop_machine call. Please find mt reg dump below.
>>> Another important observation is even though 2.6.33 kernel + stackframe
>>> patch well passes calibration hang , I am still unable boot in to a
>>> initramfs root ( verified ramfs working with VSMP). So it looks like
>>> still some issue to fix between 2.6.32 and 2.6.33 .
>>> ######################## Log ###########################
>>>
>>> === MIPS MT State Dump ===
>>> -- Global State --
>>> MVPControl Passed: 00000005
>>> MVPControl Read: 00000004
>>> MVPConf0 : a8008406
>>> -- per-VPE State --
>>> VPE 0
>>> VPEControl : 00008000
>>> VPEConf0 : 800f0003
>>> VPE0.Status : 11004201
>>> VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108
>>> VPE0.Cause : 50804000
>>> VPE0.Config7 : 00010000
>>> VPE 1
>>> VPEControl : 00068006
>>> VPEConf0 : 80cf0003
>>> VPE1.Status : 11008301
>>> VPE1.EPC : 801022a0 r4k_wait+0x20/0x40
>>> VPE1.Cause : 50800000
>>> VPE1.Config7 : 00010000
>>> -- per-TC State --
>>> TC 0 (current TC with VPE EPC above)
>>> TCStatus : 18102000
>>> TCBind : 00000000
>>> TCRestart : 803fa19c printk+0xc/0x30
>>> TCHalt : 00000000
>>> TCContext : 00000000
>>> TC 1
>>> TCStatus : 18902000
>>> TCBind : 00200000
>>> TCRestart : 801022a0 r4k_wait+0x20/0x40
>>> TCHalt : 00000000
>>> TCContext : 00140000
>>> TC 2
>>> TCStatus : 18902000
>>> TCBind : 00400000
>>> TCRestart : 801022a0 r4k_wait+0x20/0x40
>>> TCHalt : 00000000
>>> TCContext : 00280000
>>> TC 3
>>> TCStatus : 18902000
>>> TCBind : 00600000
>>> TCRestart : 801022a0 r4k_wait+0x20/0x40
>>> TCHalt : 00000000
>>> TCContext : 003c0000
>>> TC 4
>>> TCStatus : 18902000
>>> TCBind : 00800001
>>> TCRestart : 8010229c r4k_wait+0x1c/0x40
>>> TCHalt : 00000000
>>> TCContext : 00500000
>>> TC 5
>>> TCStatus : 18902000
>>> TCBind : 00a00001
>>> TCRestart : 8010229c r4k_wait+0x1c/0x40
>>> TCHalt : 00000000
>>> TCContext : 00640000
>>> TC 6
>>> TCStatus : 18902000
>>> TCBind : 00c00001
>>> TCRestart : 8010229c r4k_wait+0x1c/0x40
>>> TCHalt : 00000000
>>> TCContext : 00780000
>>> Counter Interrupts taken per CPU (TC)
>>> 0: 0
>>> 1: 0
>>> 2: 0
>>> 3: 0
>>> 4: 0
>>> 5: 0
>>> 6: 0
>>> 7: 0
>>> Self-IPI invocations:
>>> 0: 12
>>> 1: 0
>>> 2: 0
>>> 3: 0
>>> 4: 0
>>> 5: 5
>>> 6: 4
>>> 7: 0
>>> IPIQ[0]: head = 0x0, tail = 0x0, depth = 0
>>> IPIQ[1]: head = 0x0, tail = 0x0, depth = 0
>>> IPIQ[2]: head = 0x0, tail = 0x0, depth = 0
>>> IPIQ[3]: head = 0x0, tail = 0x0, depth = 0
>>> IPIQ[4]: head = 0x0, tail = 0x0, depth = 0
>>> IPIQ[5]: head = 0x0, tail = 0x0, depth = 0
>>> IPIQ[6]: head = 0x0, tail = 0x0, depth = 0
>>> IPIQ[7]: head = 0x0, tail = 0x0, depth = 0
>>> 0 Recoveries of "stolen" FPU
>>> ===========================
>>>
>>> ################################################################
>>>
>>> Thanks
>>> Anoop
>>>
>>> On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote:
>>>> I took a quick look last night, and the only thing that looked vaguely
>>>> dangerous in changes since the timer changes I alluded to earlier was
>>>> the global naming cleanup of irq-related function names that David
>>>> Howell submitted. The diff didn't look dangerous in itself, but some of
>>>> the definitions are nested subtly for SMTC to maximize the amount of
>>>> common code, and I could imagine something getting lost in translation
>>>> there. If that were really the problem, it would of course affect much
>>>> more than just the timer subsystem, but early in the boot process,
>>>> timers are pretty much the only interrupts that have to be handled
>>>> correctly.
>>>>
>>>> I'm travelling today, but will take a look at timekeeping_notify()
>>>> tomorrow or the next day...
>>>>
>>>> /K.
>>>>
>>>> On 12/28/10 12:19 AM, Anoop P A wrote:
>>>>> Hi,
>>>>>
>>>>> I had a glance into the code diff without notice of any suspect-able
>>>>> code .
>>>>> Tracing the hang showed that it is getting hanged in timekeeping_notify
>>>>> function.
>>>>>
>>>>> Thanks,
>>>>> Anoop
>>>>>
>>>>> PS: I may not be available until Thursday
>>>>>
>>>>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote:
>>>>>> Hi Kevin,
>>>>>>
>>>>>> It is very unlikely that the patch you pointed has any impact on the the
>>>>>> hang I am seeing. The patch you have mentioned got into kernel around
>>>>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( +
>>>>>> stackframe patch) .
>>>>>>
>>>>>> Hi Stuart,
>>>>>>
>>>>>> I haven't got much time to spend on this today.
>>>>>>
>>>>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have
>>>>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head)
>>>>>>
>>>>>> So probably some patches in 2.6.37 branch introduced this hang.
>>>>>>
>>>>>> Hopefully I will get some free slot tomorrow so that I can look into
>>>>>> code diff .
>>>>>>
>>>>>> Thanks
>>>>>> Anoop
>>>>>>
>>>>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote:
>>>>>>> Kevin,
>>>>>>>
>>>>>>> Outstanding, sometimes it's better to be lucky than good.
>>>>>>>
>>>>>>>
>>>>>>> Anoop,
>>>>>>>
>>>>>>> Maybe we can get lucky again.
>>>>>>>
>>>>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions,
>>>>>>> I'll be happy to do another diff.
>>>>>>>
>>>>>>>
>>>>>>> Hope you'll have had a good Christmas as well.
>>>>>>> We've had snow in Alabama since Christmas eve!
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Stuart
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Kevin D. Kissell [mailto:kevink@paralogos.com]
>>>>>>> Sent: Friday, December 24, 2010 5:34 PM
>>>>>>> To: Anoop P A
>>>>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@linux-mips.org
>>>>>>> Subject: Re: SMTC support status in latest git head.
>>>>>>>
>>>>>>>
>>>>>>> Ah, well, at least we have a stackframe.h fix that preserves David's
>>>>>>> performance tweak for the deeper pipelined processors. In looking for
>>>>>>> this, I did notice that someone did some modification to the SMTC clock
>>>>>>> tick logic that I was skeptical had ever been tested. If you've still
>>>>>>> got that kernel binary handy, you might check to see if it boots with
>>>>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2.
>>>>>>>
>>>>>>> Oh, yes, and Merry Christmas one and all!
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Kevin K.
>>>>>>>
>>>>>>> On 12/24/10 8:02 AM, Anoop P A wrote:
>>>>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote:
>>>>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also
>>>>>>>>> fix things, while preserving the other fixes and performance enhancements?
>>>>>>>>>
>>>>>>>> I have tested that patch with 2.6.37 branch it well passes calibration
>>>>>>>> loop but hangs after switching to mips closource
>>>>>>>>
>>>>>>>> TC 6 going on-line as CPU 6
>>>>>>>> Brought up 7 CPUs
>>>>>>>> bio: create slab<bio-0> at 0
>>>>>>>> SCSI subsystem initialized
>>>>>>>> Switching to clocksource MIPS
>>>>>>>>
>>>>>>>> I Presume this is a different issue as restoring older file didn't help
>>>>>>>> much to get rid of this hang.
>>>>>>>>
>>>>>>>> diff --git a/arch/mips/include/asm/stackframe.h
>>>>>>>> b/arch/mips/include/asm/stackframe.h
>>>>>>>> index 58730c5..7fc9f10 100644
>>>>>>>> --- a/arch/mips/include/asm/stackframe.h
>>>>>>>> +++ b/arch/mips/include/asm/stackframe.h
>>>>>>>> @@ -195,9 +195,9 @@
>>>>>>>> * to cover the pipeline delay.
>>>>>>>> */
>>>>>>>> .set mips32
>>>>>>>> - mfc0 v1, CP0_TCSTATUS
>>>>>>>> + mfc0 v0, CP0_TCSTATUS
>>>>>>>> .set mips0
>>>>>>>> - LONG_S v1, PT_TCSTATUS(sp)
>>>>>>>> + LONG_S v0, PT_TCSTATUS(sp)
>>>>>>>> #endif /* CONFIG_MIPS_MT_SMTC */
>>>>>>>> LONG_S $4, PT_R4(sp)
>>>>>>>> LONG_S $5, PT_R5(sp)
>>>>>>>>
>>>>>>>>
>>>>>>>>> /K.
>>>>>>>>>
>>>>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote:
>>>>>>>>>> Hi Kevin, Stuart ,
>>>>>>>>>>
>>>>>>>>>> Woohooo You guys spotted !.
>>>>>>>>>>
>>>>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be
>>>>>>>>>> the culprit
>>>>>>>>>>
>>>>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started
>>>>>>>>>> booting !.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Anoop
>>>>>>>>>>
>>>>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote:
>>>>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between
>>>>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved
>>>>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204,
>>>>>>>>>>> depending on the version) from two instructions after the mfc0 to a
>>>>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of
>>>>>>>>>>> the register access. Unfortunately, the v1 register is also used in the
>>>>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets
>>>>>>>>>>> clobbered before it gets stored. This will eventually result in the
>>>>>>>>>>> Status register getting a TCStatus value, which has some bits on common,
>>>>>>>>>>> but isn't identical and sooner or later Bad Things will happen.
>>>>>>>>>>>
>>>>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch.
>>>>>>>>>>>
>>>>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS
>>>>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance
>>>>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2
>>>>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd
>>>>>>>>>>> lean toward the second option, but I'm not in a position to test and
>>>>>>>>>>> submit a patch just now.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Kevin K.
>>>>>>>>>>>
>>>>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote:
>>>>>>>>>>>> Kevin,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not sure if it's useful,
>>>>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out.
>>>>>>>>>>>> works 2.6.32-stable with patch 804
>>>>>>>>>>>> works_not 2.6.33-stable
>>>>>>>>>>>>
>>>>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC
>>>>>>>>>>>> and looking for timer interrupt related stuff found the following differences:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> arch/mips/include/asm/irq.h
>>>>>>>>>>>> arch/mips/kernel/irq.c
>>>>>>>>>>>> do_IRQ
>>>>>>>>>>>>
>>>>>>>>>>>> arch/mips/include/asm/stackframe.h
>>>>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp
>>>>>>>>>>>>
>>>>>>>>>>>> arch/mips/include/asm/time.h
>>>>>>>>>>>> clocksource_set_clock
>>>>>>>>>>>>
>>>>>>>>>>>> arch/mips/kernel/process.c
>>>>>>>>>>>> cpu_idle
>>>>>>>>>>>>
>>>>>>>>>>>> arch/mips/kernel/smtc.c
>>>>>>>>>>>> __irq_entry
>>>>>>>>>>>> ipi_decode
>>>>>>>>>>>> SMTC_CLOCK_TICK
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Enclosed are the two subsets of files for a more expert look.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll try to look in more detail after Christmas.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> Stuart
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>
>
next prev parent reply other threads:[~2011-01-03 16:14 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-16 15:37 SMTC support status in latest git head STUART VENTERS
2010-12-16 15:37 ` STUART VENTERS
[not found] ` <4D0A677C.6040104@paralogos.com>
2010-12-16 19:58 ` Kevin D. Kissell
2010-12-17 21:35 ` Kevin D. Kissell
2010-12-20 10:44 ` Anoop P A
[not found] ` <4D10F7A9.1020306@paralogos.com>
2010-12-21 20:06 ` Anoop P.A.
2010-12-21 20:06 ` Anoop P.A.
2010-12-21 20:29 ` Anoop P.A.
2010-12-21 20:29 ` Anoop P.A.
2010-12-22 10:27 ` Kevin D. Kissell
2010-12-22 11:35 ` Anoop P A
2010-12-22 11:37 ` Kevin D. Kissell
2010-12-22 11:51 ` Anoop P A
2010-12-22 13:03 ` Kevin D. Kissell
2010-12-22 16:34 ` STUART VENTERS
2010-12-22 16:34 ` STUART VENTERS
2010-12-23 21:09 ` STUART VENTERS
2010-12-23 21:09 ` STUART VENTERS
2010-12-24 12:32 ` Kevin D. Kissell
2010-12-24 14:39 ` Anoop P A
2010-12-24 14:53 ` Kevin D. Kissell
2010-12-24 16:02 ` Anoop P A
2010-12-24 23:34 ` Kevin D. Kissell
2010-12-25 7:32 ` Anoop P A
2010-12-25 15:17 ` Kevin D. Kissell
2010-12-27 15:49 ` STUART VENTERS
2010-12-27 15:49 ` STUART VENTERS
2010-12-27 17:19 ` Anoop P A
2010-12-28 8:19 ` Anoop P A
2010-12-28 8:43 ` Kevin D. Kissell
2010-12-31 12:27 ` Anoop P A
2011-01-01 8:42 ` Kevin D. Kissell
2011-01-03 15:12 ` Anoop P A
2011-01-03 16:14 ` Kevin D. Kissell [this message]
2011-01-03 19:20 ` Anoop P A
2011-01-04 8:17 ` Kevin D. Kissell
2011-01-04 13:02 ` Anoop P A
2011-01-04 14:37 ` Anoop P A
2011-01-04 17:21 ` Kevin D. Kissell
2011-01-04 17:54 ` Anoop P A
2011-01-04 18:33 ` Kevin D. Kissell
2011-01-05 13:11 ` Anoop P A
2011-01-05 19:23 ` Kevin D. Kissell
2011-01-06 20:23 ` Anoop P A
2011-01-06 23:31 ` Kevin D. Kissell
2011-01-07 7:56 ` Anoop P A
2011-01-07 18:46 ` Kevin D. Kissell
2011-01-08 19:33 ` Anoop P A
2011-01-10 19:30 ` Kevin D. Kissell
2011-01-11 4:05 ` Anoop P A
2011-01-13 7:53 ` Kevin D. Kissell
2011-01-04 17:40 ` Kevin D. Kissell
2011-01-05 13:09 ` Anoop P A
-- strict thread matches above, loose matches on Subject: below --
2010-12-14 21:27 STUART VENTERS
2010-12-14 21:27 ` STUART VENTERS
2010-12-14 23:01 ` Kevin D. Kissell
2010-12-08 13:48 Anoop P.A.
2010-12-08 13:48 ` Anoop P.A.
2010-12-09 17:07 ` Ralf Baechle
2010-12-09 18:52 ` Kevin D. Kissell
2010-12-14 15:25 ` Anoop P.A.
2010-12-14 15:25 ` Anoop P.A.
2010-12-14 18:32 ` Kevin D. Kissell
2010-12-14 18:50 ` Ralf Baechle
2010-12-15 19:18 ` Anoop P A
2010-12-15 19:58 ` Kevin D. Kissell
2010-12-16 13:03 ` Anoop P A
2010-12-16 18:43 ` Kevin D. Kissell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D21F5D3.50604@paralogos.com \
--to=kevink@paralogos.com \
--cc=Anoop_P.A@pmc-sierra.com \
--cc=anoop.pa@gmail.com \
--cc=linux-mips@linux-mips.org \
--cc=stuart.venters@adtran.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.