kswapd Oops

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* kswapd Oops
@ 2001-01-15  1:36 Timothy Ritchey
  2001-01-15  5:37 ` Dan Malek
  0 siblings, 1 reply; 6+ messages in thread
From: Timothy Ritchey @ 2001-01-15  1:36 UTC (permalink / raw)
  To: linuxppc-embedded

I am running the the linuxppc_2_4 snapshot from 01/13/01, booting using
ppcboot on a custom 860T-based board with 64 MB of dram. I am getting
periodic Oops like:

 Oops: kernel access of bad area, sig: 11
NIP: C002C044 XER: 00000140 LR: C002C024 SP: C3FA3F10 REGS: c3fa3e60
TRAP: 0300
[snip][snip][snip]
Call backtrace:
C015A354 C002C298 C0004E8C

I can still access the shell though.

the call backtrace indicates this is occuring during kswapd. I am trying
to isolate the problem to figure out what needs fixing. Is this a
potential MMU problem, or could it be a problem with the swap setup. I
am not sure what needs to be done with regards to an initrd-based system
with respect to swap. It doesn't seem to me that it should even BE
swapping since there is no disk... :/

Any advice would be appreciated.

Cheers,
tim

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kswapd Oops
  2001-01-15  1:36 kswapd Oops Timothy Ritchey
@ 2001-01-15  5:37 ` Dan Malek
  2001-01-15 17:16   ` Timothy Ritchey
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Malek @ 2001-01-15  5:37 UTC (permalink / raw)
  To: Timothy Ritchey; +Cc: linuxppc-embedded


Timothy Ritchey wrote:
>
> I am running the the linuxppc_2_4 snapshot from 01/13/01, booting using
> ppcboot on a custom 860T-based board with 64 MB of dram. I am getting
> periodic Oops like:

What other versions of the kernel have you used that seem to work OK?

> the call backtrace indicates this is occuring during kswapd. I am trying
> to isolate the problem to figure out what needs fixing. Is this a
> potential MMU problem, or could it be a problem with the swap setup.

What is the silicon revision of the chip?

What were you doing at the time you received this error?


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kswapd Oops
  2001-01-15  5:37 ` Dan Malek
@ 2001-01-15 17:16   ` Timothy Ritchey
  2001-01-15 17:22     ` Dan Malek
  0 siblings, 1 reply; 6+ messages in thread
From: Timothy Ritchey @ 2001-01-15 17:16 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded

Dan Malek wrote:
>
> What other versions of the kernel have you used that seem to work OK?

so far, this is the only version I have gotten to boot completely. I
have tried the montavista 2.2.14-1.2.2_1 kernel, but it crashes after
loading the compressed ramdisk image with:

RAMDISK: Compressed image found at block 0
Kernel panic: VFS: Free block list corrupted
Rebooting in 180 seconds..

>
> What is the silicon revision of the chip?

It is a B5 revision

>
> What were you doing at the time you received this error?

The "Oops: kernel access of bad area, sig: 11" occurs at a couple of
different times. 1) sometimes on boot, it will occur right after kswapd
is enabled, and hang the kernel completely. 2) Sometimes it occurs right
after the:

RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: enabling 8 loop devices
Oops: kernel access of bad area, sig: 11

If it gets past these two spots (which is most of the time) it will get
me to the bash# prompt. Once there, it randomly will Oops, but I can hit
return, and be right back at the command prompt (it will Oops even if I
just leave the machine idle and am not running any commands from the
shell). I just recently came across some additional errors like this:

bash# ls
exec.c:278: bad pte 001709c9.
exit_mmap: map count is 14
Segmentation fault

Thanks for any help.

Cheers,
tim

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kswapd Oops
  2001-01-15 17:16   ` Timothy Ritchey
@ 2001-01-15 17:22     ` Dan Malek
  2001-01-15 18:35       ` Timothy Ritchey
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Malek @ 2001-01-15 17:22 UTC (permalink / raw)
  To: Timothy Ritchey; +Cc: linuxppc-embedded

Timothy Ritchey wrote:

> so far, this is the only version I have gotten to boot completely.

Ahhh.....

> ...... I
> have tried the montavista 2.2.14-1.2.2_1 kernel, but it crashes after
> loading the compressed ramdisk image with:

That kernel _works_.  If you can't boot that one, you better start
looking elsewhere for problems.

> It is a B5 revision

Are you including the software work arounds for the CPU6 silicon
errata?

> is enabled, and hang the kernel completely. 2) Sometimes it occurs right
> after the:
>
> RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
> loop: enabling 8 loop devices
> Oops: kernel access of bad area, sig: 11

Yeah, you have some hardware problems.  I suspect if you disable
caches or only run with data cache in writethrough mode you will
make more progress.

The Linux kernel does things you just can't without some pretty
sophsticated diagnostics.  With the MMU and caches enabled, plus
the CPM running I/O in parallel, you will generate worst case bus
timing to the DRAM that you have never seen before.  The problems
you see are quite typical of a memory controller configuration
or a processor/memory board layout problem.

	-- Dan

--

	I like MMUs because I don't have a real life.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kswapd Oops
  2001-01-15 17:22     ` Dan Malek
@ 2001-01-15 18:35       ` Timothy Ritchey
  2001-01-15 18:45         ` Dan Malek
  0 siblings, 1 reply; 6+ messages in thread
From: Timothy Ritchey @ 2001-01-15 18:35 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded

Dan Malek wrote:
>
> That kernel _works_.  If you can't boot that one, you better start
> looking elsewhere for problems.

That is what I figured, although I had to make some changes to get the
FEC driver to compile (net/core/dev.c refers to cpm_enet_init, but not
fec_enet_init, which seems to work if you want to use the scc ethernet I
suppose, but does not work for the fec driver).

> Are you including the software work arounds for the CPU6 silicon
> errata?

yes

> Yeah, you have some hardware problems. <SNIP> The problems
> you see are quite typical of a memory controller configuration
> or a processor/memory board layout problem.

ARRRGGGGHHH. You can't imagine how much this pains me. Well.... perhaps
you can :) I am going to disable everything I can WRT caches, etc. and
see if it clears up.

>         I like MMUs because I don't have a real life.

How appropriate in this situation...

Thanks,
tim

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kswapd Oops
  2001-01-15 18:35       ` Timothy Ritchey
@ 2001-01-15 18:45         ` Dan Malek
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Malek @ 2001-01-15 18:45 UTC (permalink / raw)
  To: Timothy Ritchey; +Cc: linuxppc-embedded

Timothy Ritchey wrote:

> That is what I figured, although I had to make some changes to get the
> FEC driver to compile

Hmmm...you may have grabbed something a little old.  I would suggest
starting with the Embedded Planet "CLLF" LSP.

> > Are you including the software work arounds for the CPU6 silicon
> > errata?
>
> yes

Although it shouldn't be causing the trouble, there is/was a bug
in the code with this option enabled.  Check the file
arch/ppc/kernel/head.S and make sure the 'cpu6_bug' buffer is
declared '.space 16' instead of '.space 4'.

> ARRRGGGGHHH. You can't imagine how much this pains me. Well.... perhaps
> you can :)

Yes, I can.  You are not the first to see this and won't be the
last.  I have been very fortunate to work with some awesome hardware
engineers in the past, where the the first round of hardware had
high speed logic connectors to the processor bus.  The MPC8xx
memory controller can do some pretty weird things, all legitimate
and logical, and not something you will comprehend from reading
the manual.  You have to see it in action, especially when running Linux.

> .... I am going to disable everything I can WRT caches, etc. and
> see if it clears up.

Another thing to test is setting burst inhibit in the ORx for the DRAM.
This will prevent the CPM DMA from generating burst cycles as well.
Most people can get the single cycle memory operations working because
the timing isn't as critical and there is lots of overhead so a few
wasted clock cycles are seldom noticed.  Burst mode has to be perfect.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-01-15 18:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-15  1:36 kswapd Oops Timothy Ritchey
2001-01-15  5:37 ` Dan Malek
2001-01-15 17:16   ` Timothy Ritchey
2001-01-15 17:22     ` Dan Malek
2001-01-15 18:35       ` Timothy Ritchey
2001-01-15 18:45         ` Dan Malek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).