* Hang with isync
@ 2006-09-19 1:28 Manoj Sharma
2006-09-20 1:16 ` Manoj Sharma
0 siblings, 1 reply; 11+ messages in thread
From: Manoj Sharma @ 2006-09-19 1:28 UTC (permalink / raw)
To: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 743 bytes --]
Hi,
We use linux kernel 2.4.20 on ppc405 and the system hangs once in a while
when isync gets called in this function:
_GLOBAL(_nmask_and_or_msr)
mfmsr r0 /* Get current msr */
andc r0,r0,r3 /* And off the bits set in r3 (first parm) */
or r0,r0,r4 /* Or on the bits in r4 (second parm) */
sync /* Some chip revs have problems here... */
isync
mtmsr r0 /* Update machine state */
isync
blr /* Done */
2.5 onwards, I find that "sync; isync" has been replaced by a macro SYNC
(defined only for 601). I don't find it in any changelog and reason for the
change.
Can someone give some information on this change?
Appreciate any help.
Manoj
<linuxppc-dev@ozlabs.org>
[-- Attachment #2: Type: text/html, Size: 1235 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-19 1:28 Hang with isync Manoj Sharma
@ 2006-09-20 1:16 ` Manoj Sharma
2006-09-20 21:35 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 11+ messages in thread
From: Manoj Sharma @ 2006-09-20 1:16 UTC (permalink / raw)
To: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 791 bytes --]
>
> Hi,
>
> We use linux kernel 2.4.20 on ppc405 and the system hangs once in a while
> when isync gets called in this function:
>
> _GLOBAL(_nmask_and_or_msr)
> mfmsr r0 /* Get current msr */
> andc r0,r0,r3 /* And off the bits set in r3 (first parm) */
> or r0,r0,r4 /* Or on the bits in r4 (second parm) */
> sync /* Some chip revs have problems here... */
> isync
> mtmsr r0 /* Update machine state */
> isync
> blr /* Done */
>
> 2.5 onwards, I find that "sync; isync" has been replaced by a macro SYNC
> (defined only for 601). I don't find it in any changelog and reason for the
> change.
>
> Can someone give some information on this change?
>
> Appreciate any help.
> Manoj
>
> <linuxppc-dev@ozlabs.org>
>
[-- Attachment #2: Type: text/html, Size: 1490 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-20 1:16 ` Manoj Sharma
@ 2006-09-20 21:35 ` Benjamin Herrenschmidt
2006-09-20 22:31 ` Manoj Sharma
0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2006-09-20 21:35 UTC (permalink / raw)
To: Manoj Sharma; +Cc: linuxppc-dev
On Tue, 2006-09-19 at 18:16 -0700, Manoj Sharma wrote:
> Hi,
>
> We use linux kernel 2.4.20 on ppc405 and the system hangs once
> in a while when isync gets called in this function:
>
> _GLOBAL(_nmask_and_or_msr)
> mfmsr r0 /* Get current msr */
> andc r0,r0,r3 /* And off the bits set in r3 (first
> parm) */
> or r0,r0,r4 /* Or on the bits in r4 (second parm) */
> sync /* Some chip revs have problems here... */
> isync
> mtmsr r0 /* Update machine state */
> isync
> blr /* Done */
>
> 2.5 onwards, I find that "sync; isync" has been replaced by a
> macro SYNC (defined only for 601). I don't find it in any
> changelog and reason for the change.
>
> Can someone give some information on this change?
Regardless of the change... on 2.4, _nmask_and_or_msr() was used for a
number of things. We would need to know where it was called from with
what values as arguments to have an idea of what's going wrong. It's
probably not dying on the isync, but rather on the following mtmsr due
to a problem with the values passed in....
Ben.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-20 21:35 ` Benjamin Herrenschmidt
@ 2006-09-20 22:31 ` Manoj Sharma
2006-09-20 22:38 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 11+ messages in thread
From: Manoj Sharma @ 2006-09-20 22:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 2089 bytes --]
This is the stack trace.
Registers:
GPR00: 00069030 C01F3000 C01F1080 00000000 00048000 C0639F48 C01F1080
FFFFFC18
GPR08: C02203FC 00000020 C0638000 C01F31B0 42FEE022 1056A7F8 00FE502A
00000000
GPR16: 00000000 FFC44232 00000000 00000000 FFC441EC 00080000 00010000
0000000A
GPR24: 00000000 0007CD80 00000CE0 00000000 00000000 C02B0000 00000000
C02B0000
NIP; c0005da4 _<_nmask_and_or_msr+0x18/0x20 [kernel]>
Trace; c0025328 _<check_pgt_cache+0x20/0x30 [kernel]>
Trace; c0004f4c _<idled+0x58/0x70 [kernel]>
Trace; c0004f74 _<cpu_idle+0x10/0x24 [kernel]>
Trace; c00012b0 _<rest_init+0x30/0x40 [kernel]>
Trace; c02a45a4 _<start_kernel+0x168/0x17c [kernel]>
Trace; c0000250 _<skpinv+0x1f8/0x234 [kernel]>
On 9/20/06, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> On Tue, 2006-09-19 at 18:16 -0700, Manoj Sharma wrote:
> > Hi,
> >
> > We use linux kernel 2.4.20 on ppc405 and the system hangs once
> > in a while when isync gets called in this function:
> >
> > _GLOBAL(_nmask_and_or_msr)
> > mfmsr r0 /* Get current msr */
> > andc r0,r0,r3 /* And off the bits set in r3 (first
> > parm) */`
> > or r0,r0,r4 /* Or on the bits in r4 (second parm) */
> > sync /* Some chip revs have problems here... */
> > isync
> > mtmsr r0 /* Update machine state */
> > isync
> > blr /* Done */
> >
> > 2.5 onwards, I find that "sync; isync" has been replaced by a
> > macro SYNC (defined only for 601). I don't find it in any
> > changelog and reason for the change.
> >
> > Can someone give some information on this change?
>
> Regardless of the change... on 2.4, _nmask_and_or_msr() was used for a
> number of things. We would need to know where it was called from with
> what values as arguments to have an idea of what's going wrong. It's
> probably not dying on the isync, but rather on the following mtmsr due
> to a problem with the values passed in....
>
> Ben.
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 3673 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-20 22:31 ` Manoj Sharma
@ 2006-09-20 22:38 ` Benjamin Herrenschmidt
2006-09-20 23:04 ` Linas Vepstas
0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2006-09-20 22:38 UTC (permalink / raw)
To: Manoj Sharma; +Cc: linuxppc-dev
On Wed, 2006-09-20 at 15:31 -0700, Manoj Sharma wrote:
> This is the stack trace.
>
> Registers:
> GPR00: 00069030 C01F3000 C01F1080 00000000 00048000 C0639F48 C01F1080
> FFFFFC18
> GPR08: C02203FC 00000020 C0638000 C01F31B0 42FEE022 1056A7F8 00FE502A
> 00000000
> GPR16: 00000000 FFC44232 00000000 00000000 FFC441EC 00080000 00010000
> 0000000A
> GPR24: 00000000 0007CD80 00000CE0 00000000 00000000 C02B0000 00000000
> C02B0000
>
> NIP; c0005da4 _<_nmask_and_or_msr+0x18/0x20 [kernel]>
> Trace; c0025328 _<check_pgt_cache+0x20/0x30 [kernel]>
> Trace; c0004f4c _<idled+0x58/0x70 [kernel]>
> Trace; c0004f74 _<cpu_idle+0x10/0x24 [kernel]>
> Trace; c00012b0 _<rest_init+0x30/0x40 [kernel]>
> Trace; c02a45a4 _<start_kernel+0x168/0x17c [kernel]>
> Trace; c0000250 _<skpinv+0x1f8/0x234 [kernel]>
Is this upstream 2.6.20 or do you have any additional patches ? (Like
Montavista stuff or RT linux or whatever ?)
It's unclear to me from just that backtrace what mask it is... it could
just be re-enabling interrupt and you have a stale IRQ line asserted....
Have you also checked the errata list for your 405 core, in case it has
a known issue ?
Ben.
>
> On 9/20/06, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Tue, 2006-09-19 at 18:16 -0700, Manoj Sharma wrote:
> > Hi,
> >
> > We use linux kernel 2.4.20 on ppc405 and the system
> hangs once
> > in a while when isync gets called in this function:
> >
> > _GLOBAL(_nmask_and_or_msr)
> > mfmsr r0 /* Get current msr */
> > andc r0,r0,r3 /* And off the bits set in
> r3 (first
> > parm) */`
> > or r0,r0,r4 /* Or on the bits in r4 (second
> parm) */
> > sync /* Some chip revs have problems
> here... */
> > isync
> > mtmsr r0 /* Update machine state */
> > isync
> > blr /* Done */
> >
> > 2.5 onwards, I find that "sync; isync" has been
> replaced by a
> > macro SYNC (defined only for 601). I don't find it
> in any
> > changelog and reason for the change.
> >
> > Can someone give some information on this change?
>
> Regardless of the change... on 2.4, _nmask_and_or_msr() was
> used for a
> number of things. We would need to know where it was called
> from with
> what values as arguments to have an idea of what's going
> wrong. It's
> probably not dying on the isync, but rather on the following
> mtmsr due
> to a problem with the values passed in....
>
> Ben.
>
>
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-20 22:38 ` Benjamin Herrenschmidt
@ 2006-09-20 23:04 ` Linas Vepstas
2006-09-21 0:59 ` Manoj Sharma
0 siblings, 1 reply; 11+ messages in thread
From: Linas Vepstas @ 2006-09-20 23:04 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On Thu, Sep 21, 2006 at 08:38:13AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2006-09-20 at 15:31 -0700, Manoj Sharma wrote:
> > This is the stack trace.
> >
> > Registers:
> > GPR00: 00069030
This is the MSR and it has the user-mode bit set, which is surely wrong.
This is not how one gets to user space.
00048000
The MSR had this or'ed into it, which is setting the user-mode bit.
Surely that's wrong.
--linas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-20 23:04 ` Linas Vepstas
@ 2006-09-21 0:59 ` Manoj Sharma
2006-09-21 4:24 ` Liu Dave-r63238
0 siblings, 1 reply; 11+ messages in thread
From: Manoj Sharma @ 2006-09-21 0:59 UTC (permalink / raw)
To: Linas Vepstas; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 924 bytes --]
No MSR is 00029030 and user mode bit is not set here.
I had missed it in the prev mail:
\x05NIP: C0005DA4 XER: 20000000 LR: C0004FE4 SP: C01F3000\x05 REGS: c01eff30 TRAP:
1020 Not tainted
MSR: 00029030 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c01f1080[0] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
\x05PLB0: bear= 0x08000000 acr= 0xbb000000 besr= 0x00000000
On 9/20/06, Linas Vepstas <linas@austin.ibm.com> wrote:
>
> On Thu, Sep 21, 2006 at 08:38:13AM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2006-09-20 at 15:31 -0700, Manoj Sharma wrote:
> > > This is the stack trace.
> > >
> > > Registers:
> > > GPR00: 00069030
>
> This is the MSR and it has the user-mode bit set, which is surely wrong.
> This is not how one gets to user space.
>
> 00048000
>
> The MSR had this or'ed into it, which is setting the user-mode bit.
> Surely that's wrong.
>
> --linas
>
[-- Attachment #2: Type: text/html, Size: 1284 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Hang with isync
2006-09-21 0:59 ` Manoj Sharma
@ 2006-09-21 4:24 ` Liu Dave-r63238
2006-09-21 6:17 ` Manoj Sharma
0 siblings, 1 reply; 11+ messages in thread
From: Liu Dave-r63238 @ 2006-09-21 4:24 UTC (permalink / raw)
To: Manoj Sharma, Linas Vepstas; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1179 bytes --]
No MSR is 00029030 and user mode bit is not set here.
I had missed it in the prev mail:
\x05NIP: C0005DA4 XER: 20000000 LR: C0004FE4 SP: C01F3000\x05 REGS: c01eff30
TRAP: 1020 Not tainted
MSR: 00029030 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c01f1080[0] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
\x05PLB0: bear= 0x08000000 acr= 0xbb000000 besr= 0x00000000
Dave>I notice that MSR and TRAP, MSR is 00029030- the critical interrupt
enable.
Dave>TRAP is 1020. --WatchDog timer exception is happening
Dave>You can disable the MSR[CE] bit to no critical exception or disable
the WD timer
On 9/20/06, Linas Vepstas <linas@austin.ibm.com> wrote:
On Thu, Sep 21, 2006 at 08:38:13AM +1000, Benjamin
Herrenschmidt wrote:
> On Wed, 2006-09-20 at 15:31 -0700, Manoj Sharma wrote:
> > This is the stack trace.
> >
> > Registers:
> > GPR00: 00069030
This is the MSR and it has the user-mode bit set, which
is surely wrong.
This is not how one gets to user space.
00048000
The MSR had this or'ed into it, which is setting the
user-mode bit.
Surely that's wrong.
--linas
[-- Attachment #2: Type: text/html, Size: 2624 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-21 4:24 ` Liu Dave-r63238
@ 2006-09-21 6:17 ` Manoj Sharma
2006-09-21 6:27 ` Liu Dave-r63238
0 siblings, 1 reply; 11+ messages in thread
From: Manoj Sharma @ 2006-09-21 6:17 UTC (permalink / raw)
To: Liu Dave-r63238; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1566 bytes --]
Dave, watchdog timeout is around one second and no cpu activity for that
long is something wrong. Is it ok to disable it to hide the problem lying
somewhere else? Do you think it can be because of sync-isync
instructions and moving to 2.6 might resolve it?
On 9/20/06, Liu Dave-r63238 <DaveLiu@freescale.com> wrote:
>
>
> No MSR is 00029030 and user mode bit is not set here.
>
> I had missed it in the prev mail:
>
> \x05NIP: C0005DA4 XER: 20000000 LR: C0004FE4 SP: C01F3000\x05 REGS: c01eff30
> TRAP: 1020 Not tainted
> MSR: 00029030 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> TASK = c01f1080[0] 'swapper' Last syscall: 120
> last math 00000000 last altivec 00000000
> \x05PLB0: bear= 0x08000000 acr= 0xbb000000 besr= 0x00000000
>
> Dave>I notice that MSR and TRAP, MSR is 00029030- the critical interrupt
> enable.
> Dave>TRAP is 1020. --WatchDog timer exception is happening
> Dave>You can disable the MSR[CE] bit to no critical exception or disable
> the WD timer
>
>
>
> On 9/20/06, Linas Vepstas <linas@austin.ibm.com> wrote:
> >
> > On Thu, Sep 21, 2006 at 08:38:13AM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2006-09-20 at 15:31 -0700, Manoj Sharma wrote:
> > > > This is the stack trace.
> > > >
> > > > Registers:
> > > > GPR00: 00069030
> >
> > This is the MSR and it has the user-mode bit set, which is surely wrong.
> > This is not how one gets to user space.
> >
> > 00048000
> >
> > The MSR had this or'ed into it, which is setting the user-mode bit.
> > Surely that's wrong.
> >
> > --linas
> >
>
>
[-- Attachment #2: Type: text/html, Size: 2975 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Hang with isync
2006-09-21 6:17 ` Manoj Sharma
@ 2006-09-21 6:27 ` Liu Dave-r63238
2006-09-21 7:11 ` Manoj Sharma
0 siblings, 1 reply; 11+ messages in thread
From: Liu Dave-r63238 @ 2006-09-21 6:27 UTC (permalink / raw)
To: Manoj Sharma; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1921 bytes --]
First, you must make sure if it really happen at watchdog timer
exception.
if it is, you need select one suitable way to fix it.
Second, I don't believe the sync-isync instructions make it happen.
you can try the 2.6, I don't know if 2.6 kernel can resolve your
problem.
-Dave
________________________________
Dave, watchdog timeout is around one second and no cpu activity
for that long is something wrong. Is it ok to disable it to hide the
problem lying somewhere else? Do you think it can be because of
sync-isync instructions and moving to 2.6 might resolve it?
On 9/20/06, Liu Dave-r63238 <DaveLiu@freescale.com> wrote:
No MSR is 00029030 and user mode bit is not set here.
I had missed it in the prev mail:
\x05NIP: C0005DA4 XER: 20000000 LR: C0004FE4 SP: C01F3000\x05
REGS: c01eff30 TRAP: 1020 Not tainted
MSR: 00029030 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c01f1080[0] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
\x05PLB0: bear= 0x08000000 acr= 0xbb000000 besr=
0x00000000
Dave>I notice that MSR and TRAP, MSR is 00029030- the
critical interrupt enable.
Dave>TRAP is 1020. --WatchDog timer exception is
happening
Dave>You can disable the MSR[CE] bit to no critical
exception or disable the WD timer
On 9/20/06, Linas Vepstas <linas@austin.ibm.com
> wrote:
On Thu, Sep 21, 2006 at 08:38:13AM
+1000, Benjamin Herrenschmidt wrote:
> On Wed, 2006-09-20 at 15:31 -0700,
Manoj Sharma wrote:
> > This is the stack trace.
> >
> > Registers:
> > GPR00: 00069030
This is the MSR and it has the user-mode
bit set, which is surely wrong.
This is not how one gets to user space.
00048000
The MSR had this or'ed into it, which is
setting the user-mode bit.
Surely that's wrong.
--linas
[-- Attachment #2: Type: text/html, Size: 4835 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with isync
2006-09-21 6:27 ` Liu Dave-r63238
@ 2006-09-21 7:11 ` Manoj Sharma
0 siblings, 0 replies; 11+ messages in thread
From: Manoj Sharma @ 2006-09-21 7:11 UTC (permalink / raw)
To: Liu Dave-r63238; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 2299 bytes --]
The hang has trigerred watchdog timer exception. It points that there is
problem somewhere but don't know what is it.
Even with 2.4, it does not happen regularly. It occurs once in a while and
not reproducable.
On 9/20/06, Liu Dave-r63238 <DaveLiu@freescale.com> wrote:
>
> First, you must make sure if it really happen at watchdog timer
> exception.
> if it is, you need select one suitable way to fix it.
> Second, I don't believe the sync-isync instructions make it happen.
> you can try the 2.6, I don't know if 2.6 kernel can resolve your problem.
>
> -Dave
>
> ------------------------------
> Dave, watchdog timeout is around one second and no cpu activity for that
> long is something wrong. Is it ok to disable it to hide the problem lying
> somewhere else? Do you think it can be because of sync-isync
> instructions and moving to 2.6 might resolve it?
>
>
> On 9/20/06, Liu Dave-r63238 <DaveLiu@freescale.com> wrote:
> >
> >
> > No MSR is 00029030 and user mode bit is not set here.
> >
> > I had missed it in the prev mail:
> >
> > \x05NIP: C0005DA4 XER: 20000000 LR: C0004FE4 SP: C01F3000\x05 REGS: c01eff30
> > TRAP: 1020 Not tainted
> > MSR: 00029030 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> > TASK = c01f1080[0] 'swapper' Last syscall: 120
> > last math 00000000 last altivec 00000000
> > \x05PLB0: bear= 0x08000000 acr= 0xbb000000 besr= 0x00000000
> >
> > Dave>I notice that MSR and TRAP, MSR is 00029030- the critical interrupt
> > enable.
> > Dave>TRAP is 1020. --WatchDog timer exception is happening
> > Dave>You can disable the MSR[CE] bit to no critical exception or disable
> > the WD timer
> >
> >
> >
> > On 9/20/06, Linas Vepstas <linas@austin.ibm.com > wrote:
> > >
> > > On Thu, Sep 21, 2006 at 08:38:13AM +1000, Benjamin Herrenschmidt
> > > wrote:
> > > > On Wed, 2006-09-20 at 15:31 -0700, Manoj Sharma wrote:
> > > > > This is the stack trace.
> > > > >
> > > > > Registers:
> > > > > GPR00: 00069030
> > >
> > > This is the MSR and it has the user-mode bit set, which is surely
> > > wrong.
> > > This is not how one gets to user space.
> > >
> > > 00048000
> > >
> > > The MSR had this or'ed into it, which is setting the user-mode bit.
> > > Surely that's wrong.
> > >
> > > --linas
> > >
> >
> >
>
[-- Attachment #2: Type: text/html, Size: 4865 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-09-21 7:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-19 1:28 Hang with isync Manoj Sharma
2006-09-20 1:16 ` Manoj Sharma
2006-09-20 21:35 ` Benjamin Herrenschmidt
2006-09-20 22:31 ` Manoj Sharma
2006-09-20 22:38 ` Benjamin Herrenschmidt
2006-09-20 23:04 ` Linas Vepstas
2006-09-21 0:59 ` Manoj Sharma
2006-09-21 4:24 ` Liu Dave-r63238
2006-09-21 6:17 ` Manoj Sharma
2006-09-21 6:27 ` Liu Dave-r63238
2006-09-21 7:11 ` Manoj Sharma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).