* Kernel locks up after calling kernel_execve()
@ 2007-11-08 21:47 Gerhard Pircher
2007-11-08 23:20 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-08 21:47 UTC (permalink / raw)
To: linuxppc-dev
Hi,
I tested my patches for the AmigaOne platform with the lastest 2.6.24-rc2
kernel snapshot. The kernel runs through all initcalls, but locks up
completely after calling INIT (/sbin/init) by kernel_execve(). Thus I
couldn't capture any kernel oops or panic output. Also the magic sysrq
key doesn't work. Enabling debug code for soft lockups and spinlock
debugging didn't reveal any information.
I'm not sure, but I think it is the same problem I had with all kernels
>= 2.6.17. All of these kernels lock up shortly before or right at calling
the init program (resp. as soon as the kernel forks some kernel theads).
Any suggestions on how to track down this problem?
regards,
Gerhard
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-08 21:47 Kernel locks up after calling kernel_execve() Gerhard Pircher
@ 2007-11-08 23:20 ` Benjamin Herrenschmidt
2007-11-09 7:41 ` Gerhard Pircher
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-08 23:20 UTC (permalink / raw)
To: Gerhard Pircher; +Cc: linuxppc-dev
On Thu, 2007-11-08 at 22:47 +0100, Gerhard Pircher wrote:
> Hi,
>
> I tested my patches for the AmigaOne platform with the lastest 2.6.24-rc2
> kernel snapshot. The kernel runs through all initcalls, but locks up
> completely after calling INIT (/sbin/init) by kernel_execve(). Thus I
> couldn't capture any kernel oops or panic output. Also the magic sysrq
> key doesn't work. Enabling debug code for soft lockups and spinlock
> debugging didn't reveal any information.
> I'm not sure, but I think it is the same problem I had with all kernels
> >= 2.6.17. All of these kernels lock up shortly before or right at calling
> the init program (resp. as soon as the kernel forks some kernel theads).
> Any suggestions on how to track down this problem?
You don't have a HW debugger or anything like that ?
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-08 23:20 ` Benjamin Herrenschmidt
@ 2007-11-09 7:41 ` Gerhard Pircher
2007-11-09 7:50 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-09 7:41 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
-------- Original-Nachricht --------
> Datum: Fri, 09 Nov 2007 10:20:17 +1100
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: linuxppc-dev@ozlabs.org
> Betreff: Re: Kernel locks up after calling kernel_execve()
>
> On Thu, 2007-11-08 at 22:47 +0100, Gerhard Pircher wrote:
> > Hi,
> >
> > I tested my patches for the AmigaOne platform with the lastest
> > 2.6.24-rc2 kernel snapshot. The kernel runs through all initcalls, but
> > locks up completely after calling INIT (/sbin/init) by kernel_execve().
> > Thus I couldn't capture any kernel oops or panic output. Also the magic
> > sysrq key doesn't work. Enabling debug code for soft lockups and
> > spinlock debugging didn't reveal any information.
> > I'm not sure, but I think it is the same problem I had with all kernels
> > >= 2.6.17. All of these kernels lock up shortly before or right at
> > calling the init program (resp. as soon as the kernel forks some kernel
> > theads).
> > Any suggestions on how to track down this problem?
>
> You don't have a HW debugger or anything like that ?
>
> Ben.
Unfortunately, no. A BDI2000 is too costly for me.
I tried to use /bin/sh as init program and was able to enter a command,
but then the machine locked up, too.
Could that be a problem with a CPU sleeping/idle code?
Gerhard
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-09 7:41 ` Gerhard Pircher
@ 2007-11-09 7:50 ` Benjamin Herrenschmidt
2007-11-10 17:11 ` Gerhard Pircher
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-09 7:50 UTC (permalink / raw)
To: Gerhard Pircher; +Cc: linuxppc-dev
> I tried to use /bin/sh as init program and was able to enter a command,
> but then the machine locked up, too.
> Could that be a problem with a CPU sleeping/idle code?
That's possibly an issue, try disabling power save if any for that CPU
type. If it worked and broke, you may have to bisect tho.
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-09 7:50 ` Benjamin Herrenschmidt
@ 2007-11-10 17:11 ` Gerhard Pircher
2007-11-11 3:55 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-10 17:11 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
-------- Original-Nachricht --------
> Datum: Fri, 09 Nov 2007 18:50:29 +1100
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: linuxppc-dev@ozlabs.org
> Betreff: Re: Kernel locks up after calling kernel_execve()
>
> > I tried to use /bin/sh as init program and was able to enter a command,
> > but then the machine locked up, too.
> > Could that be a problem with a CPU sleeping/idle code?
>
> That's possibly an issue, try disabling power save if any for that CPU
> type. If it worked and broke, you may have to bisect tho.
I disabled the powersaving code by adding powersave=off to the kernel's
command line, but it didn't help. It seems to lockup whenever it tries to
access a filesystem.
Is there a way to debug it without a hardware debugger or can you
recommend a cheap hardware debugger?
Gerhard
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-10 17:11 ` Gerhard Pircher
@ 2007-11-11 3:55 ` Benjamin Herrenschmidt
2007-11-13 21:23 ` Gerhard Pircher
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-11 3:55 UTC (permalink / raw)
To: Gerhard Pircher; +Cc: linuxppc-dev
On Sat, 2007-11-10 at 18:11 +0100, Gerhard Pircher wrote:
> -------- Original-Nachricht --------
> > Datum: Fri, 09 Nov 2007 18:50:29 +1100
> > Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > An: Gerhard Pircher <gerhard_pircher@gmx.net>
> > CC: linuxppc-dev@ozlabs.org
> > Betreff: Re: Kernel locks up after calling kernel_execve()
>
> >
> > > I tried to use /bin/sh as init program and was able to enter a command,
> > > but then the machine locked up, too.
> > > Could that be a problem with a CPU sleeping/idle code?
> >
> > That's possibly an issue, try disabling power save if any for that CPU
> > type. If it worked and broke, you may have to bisect tho.
> I disabled the powersaving code by adding powersave=off to the kernel's
> command line, but it didn't help. It seems to lockup whenever it tries to
> access a filesystem.
> Is there a way to debug it without a hardware debugger or can you
> recommend a cheap hardware debugger?
There are ways, sure, which probably involve adding prink's all over the
place to figure it out... could be some DMA issue for example, could be
pretty much anything. Have you tried booting an initrd with no disk
access ?
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-11 3:55 ` Benjamin Herrenschmidt
@ 2007-11-13 21:23 ` Gerhard Pircher
2007-11-13 21:43 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-13 21:23 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
-------- Original-Nachricht --------
> Datum: Sun, 11 Nov 2007 14:55:40 +1100
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: linuxppc-dev@ozlabs.org
> Betreff: Re: Kernel locks up after calling kernel_execve()
>
> On Sat, 2007-11-10 at 18:11 +0100, Gerhard Pircher wrote:
> > -------- Original-Nachricht --------
> > > Datum: Fri, 09 Nov 2007 18:50:29 +1100
> > > Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > > An: Gerhard Pircher <gerhard_pircher@gmx.net>
> > > CC: linuxppc-dev@ozlabs.org
> > > Betreff: Re: Kernel locks up after calling kernel_execve()
> >
> > Is there a way to debug it without a hardware debugger or can you
> > recommend a cheap hardware debugger?
>
> There are ways, sure, which probably involve adding prink's all over the
> place to figure it out... could be some DMA issue for example, could be
> pretty much anything. Have you tried booting an initrd with no disk
> access ?
I tried to boot with a ramdisk, but that didn't help much. I still locks up
while loading an init program or after entering some commands in
sh shell. Looks like the problem is hidden deep in the kernel.
Thanks!
Gerhard
--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-13 21:23 ` Gerhard Pircher
@ 2007-11-13 21:43 ` Benjamin Herrenschmidt
2007-11-13 22:06 ` Gerhard Pircher
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-13 21:43 UTC (permalink / raw)
To: Gerhard Pircher; +Cc: linuxppc-dev
On Tue, 2007-11-13 at 22:23 +0100, Gerhard Pircher wrote:
> > There are ways, sure, which probably involve adding prink's all over
> the
> > place to figure it out... could be some DMA issue for example, could
> be
> > pretty much anything. Have you tried booting an initrd with no disk
> > access ?
> I tried to boot with a ramdisk, but that didn't help much. I still
> locks up
> while loading an init program or after entering some commands in
> sh shell. Looks like the problem is hidden deep in the kernel.
Well, at least the above tells is it's not DMA related.
I don't know of any deeply hidden problem, so you are probably hitting
something else ... if you have disabled idle, then it may be useful to
try instrumenting locks or irq enable/disable.
Also, did you try booting with all kernel debug options enabled ?
Finally, since the problem seem to have started around a specific kernel
version, can you try to bisect the patch that causes it ?
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-13 21:43 ` Benjamin Herrenschmidt
@ 2007-11-13 22:06 ` Gerhard Pircher
2007-11-13 23:37 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-13 22:06 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
-------- Original-Nachricht --------
> Datum: Wed, 14 Nov 2007 08:43:38 +1100
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: linuxppc-dev@ozlabs.org
> Betreff: Re: Kernel locks up after calling kernel_execve()
> Well, at least the above tells is it's not DMA related.
>
> I don't know of any deeply hidden problem, so you are probably hitting
> something else ... if you have disabled idle, then it may be useful to
> try instrumenting locks or irq enable/disable.
Well, I only disabled power saving with powersave=off. Are there any other
ways to disable idle? What do you mean with instrumenting locks or
irq enable/disable?
> Also, did you try booting with all kernel debug options enabled ?
I compiled in almost all kernel debugging options and booted the kernel
with driver_debug, initcall_debug and debug. I didn't notice any serious
error messages so far. Not sure however, if I missed a debug option.
> Finally, since the problem seem to have started around a specific kernel
> version, can you try to bisect the patch that causes it ?
Hmm, I'm not sure how to do this (only worked on platform code so far).
I guess you think about checking out a kernel version from the git
repository, which doesn't contain the patch for kernel_execve().
I still suspect the kernel_execve() function (which was introduced in
2.6.17) because the kernel locks up after starting the first user program.
AFAIK kernel threads should be running much earlier.
Thanks!
regards,
Gerhard
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-13 22:06 ` Gerhard Pircher
@ 2007-11-13 23:37 ` Benjamin Herrenschmidt
2007-11-14 9:39 ` Gerhard Pircher
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-13 23:37 UTC (permalink / raw)
To: Gerhard Pircher; +Cc: linuxppc-dev
On Tue, 2007-11-13 at 23:06 +0100, Gerhard Pircher wrote:
> Well, I only disabled power saving with powersave=off. Are there any
> other
> ways to disable idle? What do you mean with instrumenting locks or
> irq enable/disable?
Add printk's to things :-) It's a UP kernel so there should be no
spinlocks anyway.
Best is to try to get a 100% reprocase and printk your way toward the
origin of the problem if you don't have a HW debugger. Unless you manage
to sneak in an irq to xmon but if you are totally locked up, you
probably can't.
Could also be something you do that your buggy northbridge doesn't like.
For example, maybe it dislikes M bit in the hash table and you end up
with it set due to other reasons (I know we had changes in this area).
> > Also, did you try booting with all kernel debug options enabled ?
> I compiled in almost all kernel debugging options and booted the
> kernel
> with driver_debug, initcall_debug and debug. I didn't notice any
> serious
> error messages so far. Not sure however, if I missed a debug option.
>
> > Finally, since the problem seem to have started around a specific
> kernel
> > version, can you try to bisect the patch that causes it ?
> Hmm, I'm not sure how to do this (only worked on platform code so
> far).
> I guess you think about checking out a kernel version from the git
> repository, which doesn't contain the patch for kernel_execve().
> I still suspect the kernel_execve() function (which was introduced in
> 2.6.17) because the kernel locks up after starting the first user
> program.
> AFAIK kernel threads should be running much earlier.
They are but they cause a lot less MMU pressure, could be an
indication...
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-13 23:37 ` Benjamin Herrenschmidt
@ 2007-11-14 9:39 ` Gerhard Pircher
2007-11-14 10:04 ` Benjamin Herrenschmidt
2007-11-14 21:54 ` Paul Mackerras
0 siblings, 2 replies; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-14 9:39 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
-------- Original-Nachricht --------
> Datum: Wed, 14 Nov 2007 10:37:52 +1100
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: linuxppc-dev@ozlabs.org
> Betreff: Re: Kernel locks up after calling kernel_execve()
> Add printk's to things :-) It's a UP kernel so there should be no
> spinlocks anyway.
>
> Best is to try to get a 100% reprocase and printk your way toward the
> origin of the problem if you don't have a HW debugger. Unless you manage
> to sneak in an irq to xmon but if you are totally locked up, you
> probably can't.
Also xmon seems to lockup the machine. I was able to active it through the
magic sysrq key, but the machine died afterwards.
> Could also be something you do that your buggy northbridge doesn't like.
> For example, maybe it dislikes M bit in the hash table and you end up
> with it set due to other reasons (I know we had changes in this area).
Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code
masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
prefetch engines (I don't care about the performance loss).
I couldn't find any other code that sets the M bit, except for huge TLB
page support, but isn't that only for PPC64?
Gerhard
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-14 9:39 ` Gerhard Pircher
@ 2007-11-14 10:04 ` Benjamin Herrenschmidt
2007-11-14 10:15 ` Gerhard Pircher
2007-11-14 21:54 ` Paul Mackerras
1 sibling, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-14 10:04 UTC (permalink / raw)
To: Gerhard Pircher; +Cc: linuxppc-dev
On Wed, 2007-11-14 at 10:39 +0100, Gerhard Pircher wrote:
> Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code
> masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
> prefetch engines (I don't care about the performance loss).
> I couldn't find any other code that sets the M bit, except for huge
> TLB
> page support, but isn't that only for PPC64?
Right, it's only 64 bits. You've double checked nothing broke the M bit
thing ? In which case, I don't know what else...
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-14 10:04 ` Benjamin Herrenschmidt
@ 2007-11-14 10:15 ` Gerhard Pircher
0 siblings, 0 replies; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-14 10:15 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
-------- Original-Nachricht --------
> Datum: Wed, 14 Nov 2007 21:04:57 +1100
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: linuxppc-dev@ozlabs.org
> Betreff: Re: Kernel locks up after calling kernel_execve()
> On Wed, 2007-11-14 at 10:39 +0100, Gerhard Pircher wrote:
> > Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code
> > masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
> > prefetch engines (I don't care about the performance loss).
> > I couldn't find any other code that sets the M bit, except for huge
> > TLB
> > page support, but isn't that only for PPC64?
>
> Right, it's only 64 bits. You've double checked nothing broke the M bit
> thing ? In which case, I don't know what else...
Yes, I did. Otherwise the machine dies much earlier in the boot process.
Gerhard
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-14 9:39 ` Gerhard Pircher
2007-11-14 10:04 ` Benjamin Herrenschmidt
@ 2007-11-14 21:54 ` Paul Mackerras
2007-11-15 8:48 ` Gerhard Pircher
1 sibling, 1 reply; 15+ messages in thread
From: Paul Mackerras @ 2007-11-14 21:54 UTC (permalink / raw)
To: Gerhard Pircher; +Cc: linuxppc-dev
Gerhard Pircher writes:
> Yeah, the northbridge hates the M bit! Thus the AmigaOne platform code
Wow.
> masks out the CPU_FTR_NEED_COHERENT flag and disables the L2 cache
> prefetch engines (I don't care about the performance loss).
> I couldn't find any other code that sets the M bit, except for huge TLB
> page support, but isn't that only for PPC64?
No it's not just for ppc64. We had a patch that went in some time ago
to ensure that the M bit was set on various 32-bit platforms because
otherwise we got data corruption (due to a small cache in the
northbridge not being kept coherent with the processor cache).
Look for CPU_FTR_NEED_COHERENT in include/asm-powerpc/cputable.h and
arch/powerpc/mm/*.
Paul.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel locks up after calling kernel_execve()
2007-11-14 21:54 ` Paul Mackerras
@ 2007-11-15 8:48 ` Gerhard Pircher
0 siblings, 0 replies; 15+ messages in thread
From: Gerhard Pircher @ 2007-11-15 8:48 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
-------- Original-Nachricht --------
> Datum: Thu, 15 Nov 2007 08:54:32 +1100
> Von: Paul Mackerras <paulus@samba.org>
> An: "Gerhard Pircher" <gerhard_pircher@gmx.net>
> CC: benh@kernel.crashing.org, linuxppc-dev@ozlabs.org
> Betreff: Re: Kernel locks up after calling kernel_execve()
> No it's not just for ppc64. We had a patch that went in some time ago
> to ensure that the M bit was set on various 32-bit platforms because
> otherwise we got data corruption (due to a small cache in the
> northbridge not being kept coherent with the processor cache).
Ah, I thought this was due to a CPU errata, where L2 cache prefetching
causes data corruption with the coherent bit set to 0.
Gerhard
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-11-15 8:48 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-08 21:47 Kernel locks up after calling kernel_execve() Gerhard Pircher
2007-11-08 23:20 ` Benjamin Herrenschmidt
2007-11-09 7:41 ` Gerhard Pircher
2007-11-09 7:50 ` Benjamin Herrenschmidt
2007-11-10 17:11 ` Gerhard Pircher
2007-11-11 3:55 ` Benjamin Herrenschmidt
2007-11-13 21:23 ` Gerhard Pircher
2007-11-13 21:43 ` Benjamin Herrenschmidt
2007-11-13 22:06 ` Gerhard Pircher
2007-11-13 23:37 ` Benjamin Herrenschmidt
2007-11-14 9:39 ` Gerhard Pircher
2007-11-14 10:04 ` Benjamin Herrenschmidt
2007-11-14 10:15 ` Gerhard Pircher
2007-11-14 21:54 ` Paul Mackerras
2007-11-15 8:48 ` Gerhard Pircher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).