mmu problems

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* mmu problems
@ 2001-07-13 18:42 Adam Wozniak
  2001-07-13 20:36 ` Dan Malek
  2001-07-27  6:53 ` Peter Ryser
  0 siblings, 2 replies; 7+ messages in thread
From: Adam Wozniak @ 2001-07-13 18:42 UTC (permalink / raw)
  To: linuxppc-embedded


Ok, I'm working on porting this to a custom 8260 board.
For a while I thought I was having problems with the network
layer, because the machine would lock up after I logged in
via telnet and attempted to execute a command.

So I slowly paired things down, and got it to lock up at the console.
I found that if I logged in to the console and did a number of
ls -laR &
(about 5 running simultaneously) thinks lock up.

Sometimes I get a nice register dump, sometimes things just freeze.
The stack trace when I get a dump is always in some type of memory
management page.  (Example below).

I suspect there's something about the MMU that's not being configured
properly, but I don't really know where to start.

Help?

(8260, PPCBoot 0.9, linux 2.4.4, serial console on SMC2, ether on FCC3 )

--Adam

Oops: Exception in kernel mode, sig: 4
NIP: C0005F80 XER: 00000000 LR: C000CF9C SP: C3417D30 REGS: c3417c80
TRAP: 0700 MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c3416000[22] 'ls' Last syscall: 6
last math c3416000 last altivec 00000000
GPR00: C0023B74 C3417D30 C3416000 C3400000 00000080 00000081 C33FF000
68616E67
GPR08: 65640000 50726F66 C027B060 00000084 44442482 1004BE90 00000001
7FFFF9D0
GPR16: 3002A570 3002A2B8 7FFFF8C0 7FFFF8C4 00009032 02000000 C3687BA0
C3F6E6DC
GPR24: C000BE90 00000133 C3687C40 C368AEA0 C351FA60 C0271AE0 00000119
C0247FE4
Call backtrace:
C0271AE0 C0023B74 C001F81C C001F99C C000C020 C0003DC4 3000D02C
300050DC 30011058 30003AA4 300039D4 30013544

choice bits from System.map:

c0003db4 T ret_from_fork
c0003dbc T ret_from_intercept
c0003dc4 T ret_from_except
c0003dec T do_bottom_half_ret
c0003e20 T do_signal_ret

c000be1c t m8260_mask_and_ack
c000be78 T m8260_get_irq
c000be90 T do_page_fault
c000c218 T bad_page_fault
c000c270 T va_to_pte
c000c2dc T va_to_phys
c000c328 T print_8xx_pte

c001f674 t do_anonymous_page
c001f79c t do_no_page
c001f928 T handle_mm_fault
c001fa54 T __pmd_alloc
c001fab0 T pte_alloc

c00238a8 t nopage_sequential_readahead
c0023a4c T filemap_nopage
c0023f88 T filemap_sync

c0155440 B sysctl_tcp_mem
c015544c B unix_socket_table
c0155850 A _end




--
Adam Wozniak (KG6GZR)   COM DEV Wireless - Digital and Software Systems
awozniak@comdev.cc      3450 Broad St. 107, San Luis Obispo, CA 93401
                        http://www.comdev.cc
                        Voice: (805) 544-1089       Fax: (805) 544-2055

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mmu problems
  2001-07-13 18:42 mmu problems Adam Wozniak
@ 2001-07-13 20:36 ` Dan Malek
  2001-07-16 15:58   ` Adam Wozniak
  2001-07-27  6:53 ` Peter Ryser
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Malek @ 2001-07-13 20:36 UTC (permalink / raw)
  To: Adam Wozniak; +Cc: linuxppc-embedded

Adam Wozniak wrote:

> I suspect there's something about the MMU that's not being configured
> properly, but I don't really know where to start.

Everyone assumes there are bugs in things they don't understand.
While there is always the possibility of something wrong, there are
lots of 8260s running Linux without trouble.  The same MMU code is
used on all 6xx platforms, which includes lots of Macs that have
been running for years.

I suspect you have a memory controller or other hardware design
problem with your board.  I have worked with many hardware engineers
debugging memory interface problems once Linux is running.  There
just aren't any diagnostics that exercise a system like Linux does.
The combination of cache, burst cycles, dma, and other simultaneous
resource uses always show design/implementation bugs you have never
seen before.  You are describing the classic symtoms some of us have
seen many times before.

Adam Wozniak wrote:

> NIP: C0005F80 XER: 00000000 LR: C000CF9C SP: C3417D30 REGS: c3417c80
> TRAP: 0700 MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11

You forgot the most important function from System.map, the one
at NIP.  Also, that MSR is bogus.  Somewhere, something was fetched
from memory that had bit 12 set.  We never do this in software, and
the hardware should always have it cleared.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mmu problems
  2001-07-13 20:36 ` Dan Malek
@ 2001-07-16 15:58   ` Adam Wozniak
  2001-07-16 16:56     ` Adam Wozniak
  0 siblings, 1 reply; 7+ messages in thread
From: Adam Wozniak @ 2001-07-16 15:58 UTC (permalink / raw)
  To: Dan Malek, James F Dougherty, linuxppc-embedded


Dan Malek wrote:
>
> Adam Wozniak wrote:
>
> > I suspect there's something about the MMU that's not being configured
> > properly, but I don't really know where to start.
>
> Everyone assumes there are bugs in things they don't understand.
> While there is always the possibility of something wrong, there are
> lots of 8260s running Linux without trouble.  The same MMU code is
> used on all 6xx platforms, which includes lots of Macs that have
> been running for years.
>
> I suspect you have a memory controller or other hardware design
> problem with your board.  I have worked with many hardware engineers
> debugging memory interface problems once Linux is running.  There
> just aren't any diagnostics that exercise a system like Linux does.
> The combination of cache, burst cycles, dma, and other simultaneous
> resource uses always show design/implementation bugs you have never
> seen before.  You are describing the classic symtoms some of us have
> seen many times before.

XXXXX wrote:
>
> Sounds like either a memory controller or bridge
> initialization problem. Have you run other software on
> your board? VxWorks? a simple ROM monitor?
>
> Have you written a loop with MSR_IR and MSR_DR off which
> simply walks through all of memory? Will it hang then?
> Sometimes, a memory controller which is misbehaving will
> corrupt memory, and then you may have some problems with
> bogus instructions being executed.

I'm more likely to suspect that it is something I've done.  The piece of
hardware I'm using normally runs VxWorks, and we have several in the
field
which have been running for some time with excellent performance
records.

I do have a simple rom monitor (PPCBoot) running, and it seems to work
flawlessly.

I'll try today to get more (and better) error dumps from it.

--Adam
--
Adam Wozniak (KG6GZR)   COM DEV Wireless - Digital and Software Systems
awozniak@comdev.cc      3450 Broad St. 107, San Luis Obispo, CA 93401
                        http://www.comdev.cc
                        Voice: (805) 544-1089       Fax: (805) 544-2055

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mmu problems
  2001-07-16 15:58   ` Adam Wozniak
@ 2001-07-16 16:56     ` Adam Wozniak
  2001-07-16 18:10       ` Cal Erickson
  0 siblings, 1 reply; 7+ messages in thread
From: Adam Wozniak @ 2001-07-16 16:56 UTC (permalink / raw)
  To: Dan Malek, James F Dougherty, linuxppc-embedded


Adam Wozniak wrote:
>
> I'll try today to get more (and better) error dumps from it.

Here are two more dumps.  Both were created (separate runs)
by doing several (between 4 and 10)
ls -laR &
from a shell prompt.

Is there any way I can get gcc to annotate the assembly listing with the
C source
code?  I feel it would help pinpoint what is going on...

====
(note: the "closest to" lines were produced by a perl script which pulls
stuff from the System.map file.  "plus 96 of 760" means the given
address
is 96 bytes higher than the start of the function, and the next function
is 760 higher than the start of the function)
====

Oops: kernel access of bad area, sig: 11
NIP: C000FA60 XER: 00000000 LR: 00000000 SP: C3E97DE0 REGS: c3e97d30
TRAP: 0300
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 00000000, DSISR: 20000000
TASK = c3e96000[18] 'sh' Last syscall: 2
last math c3e96000 last altivec 00000000
GPR00: 00000000 C3E97DE0 C3E96000 00000000 C33FE000 00000000 C3400540
00000000
GPR08: 00000000 C3400040 C3400000 00000001 24448880 1004BE90 03FDE000
007FFF00
GPR16: 00000000 007FFF00 007FFEB0 03FD71A8 00009032 03E97E80 00000000
C3E97E90
GPR24: 7FFFFD60 C3E97E18 FFFFFFF4 C33FE000 C368C160 00000000 C33FE000
C3D932A0
Call backtrace:
C368C320 C00105FC C0006A4C C0003B5C 00000002 1001B6BC 1001BC20
10027B64 100277E0 0FEDC068 00000000

closest to c000fa60 :: c000fa00 t copy_mm                (plus 96 of
760)
closest to c368c320 :: NOT FOUND
closest to c00105fc :: c00100f8 T do_fork                (plus 1284 of
1952)
closest to c0006a4c :: c0006a2c T sys_fork               (plus 32 of 48)
closest to c0003b5c :: c0003b5c T ret_from_syscall_1     (plus 0 of 180)

Machine check in kernel mode.
Caused by (from SRR1=41000): Transfer error ack signal
Oops: machine check, sig: 7
NIP: 00000908 XER: 00000000 LR: C000CF9C SP: C3E81BF0 REGS: c3e81b40
TRAP: 0200
MSR: 00041000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
TASK = c3e80000[19] 'ls' Last syscall: 3
last math c3442000 last altivec 00000000
GPR00: 7D665B78 C3E81BF0 C3E80000 C33FF800 00000080 00000081 C33FF000
00000001
GPR08: 00000000 000000C8 C027B060 8299A487 44444824 1004BE90 03FDE000
007FFF00
GPR16: 00000000 007FFF00 007FFEB0 C0023424 00009032 03E81D20 00000000
C0003DC4
GPR24: C000BE90 02000000 00030002 C3E74064 C3E74064 C3681CE0 C36831A0
C0247FE4
Call backtrace:
C3681CE0 C001F714 C001F7F0 C001F99C C000C020 C0003DC4 C350DCC0
C0023068 C0023520 C0031FC0 C0003B5C 0FF31FD0 0FF307F0 0FF32040
0FF31EB4 0FF26DF0 0FF26C48 0FF2FDD0 0FE8E6D8 0FE8EB00 0FF6553C
0FF64EF4 10029794 10012B64 10012648 10012718 10012764 10012764
10012764 100133C8 10027B64 100277E0 0FEDC068

closest to c001f714 :: c001f674 t do_anonymous_page     (plus 160 of
296)
closest to c001f7f0 :: c001f79c t do_no_page     (plus 84 of 396)
closest to c001f99c :: c001f928 T handle_mm_fault     (plus 116 of 300)
closest to c000c020 :: c000be90 T do_page_fault     (plus 400 of 904)
closest to c0003dc4 :: c0003dc4 T ret_from_except     (plus 0 of 40)
closest to c350dcc0 :: NOT FOUND
closest to c0023068 :: c0022e14 T do_generic_file_read     (plus 596 of
1552)
closest to c0023520 :: c00234b8 T generic_file_read     (plus 104 of
160)
closest to c0031fc0 :: c0031ef8 T sys_read     (plus 200 of 276)
closest to c0003b5c :: c0003b5c T ret_from_syscall_1     (plus 0 of 180)

--
Adam Wozniak (KG6GZR)   COM DEV Wireless - Digital and Software Systems
awozniak@comdev.cc      3450 Broad St. 107, San Luis Obispo, CA 93401
                        http://www.comdev.cc
                        Voice: (805) 544-1089       Fax: (805) 544-2055

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mmu problems
  2001-07-16 16:56     ` Adam Wozniak
@ 2001-07-16 18:10       ` Cal Erickson
  2001-07-17 23:29         ` Adam Wozniak
  0 siblings, 1 reply; 7+ messages in thread
From: Cal Erickson @ 2001-07-16 18:10 UTC (permalink / raw)
  To: Adam Wozniak; +Cc: Dan Malek, James F Dougherty, linuxppc-embedded


Adam,
Here is one method to use to get C code interspersed with assembly.
'-Wa,'-ahls=test.lst''   THe file test.lst is the file which will have the
assembly and the c source in it. It also helps to have -g as an option
on the compiler line as well. Remember to use the single quotes as
this tells the compiler to ass this through to the assembler phase.

Cal

Adam Wozniak wrote:

> Adam Wozniak wrote:
> >
> > I'll try today to get more (and better) error dumps from it.
>
> Here are two more dumps.  Both were created (separate runs)
> by doing several (between 4 and 10)
> ls -laR &
> from a shell prompt.
>
> Is there any way I can get gcc to annotate the assembly listing with the
> C source
> code?  I feel it would help pinpoint what is going on...
>
> ====
> (note: the "closest to" lines were produced by a perl script which pulls
> stuff from the System.map file.  "plus 96 of 760" means the given
> address
> is 96 bytes higher than the start of the function, and the next function
> is 760 higher than the start of the function)
> ====
>
> Oops: kernel access of bad area, sig: 11
> NIP: C000FA60 XER: 00000000 LR: 00000000 SP: C3E97DE0 REGS: c3e97d30
> TRAP: 0300
> MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> DAR: 00000000, DSISR: 20000000
> TASK = c3e96000[18] 'sh' Last syscall: 2
> last math c3e96000 last altivec 00000000
> GPR00: 00000000 C3E97DE0 C3E96000 00000000 C33FE000 00000000 C3400540
> 00000000
> GPR08: 00000000 C3400040 C3400000 00000001 24448880 1004BE90 03FDE000
> 007FFF00
> GPR16: 00000000 007FFF00 007FFEB0 03FD71A8 00009032 03E97E80 00000000
> C3E97E90
> GPR24: 7FFFFD60 C3E97E18 FFFFFFF4 C33FE000 C368C160 00000000 C33FE000
> C3D932A0
> Call backtrace:
> C368C320 C00105FC C0006A4C C0003B5C 00000002 1001B6BC 1001BC20
> 10027B64 100277E0 0FEDC068 00000000
>
> closest to c000fa60 :: c000fa00 t copy_mm                (plus 96 of
> 760)
> closest to c368c320 :: NOT FOUND
> closest to c00105fc :: c00100f8 T do_fork                (plus 1284 of
> 1952)
> closest to c0006a4c :: c0006a2c T sys_fork               (plus 32 of 48)
> closest to c0003b5c :: c0003b5c T ret_from_syscall_1     (plus 0 of 180)
>
> Machine check in kernel mode.
> Caused by (from SRR1=41000): Transfer error ack signal
> Oops: machine check, sig: 7
> NIP: 00000908 XER: 00000000 LR: C000CF9C SP: C3E81BF0 REGS: c3e81b40
> TRAP: 0200
> MSR: 00041000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
> TASK = c3e80000[19] 'ls' Last syscall: 3
> last math c3442000 last altivec 00000000
> GPR00: 7D665B78 C3E81BF0 C3E80000 C33FF800 00000080 00000081 C33FF000
> 00000001
> GPR08: 00000000 000000C8 C027B060 8299A487 44444824 1004BE90 03FDE000
> 007FFF00
> GPR16: 00000000 007FFF00 007FFEB0 C0023424 00009032 03E81D20 00000000
> C0003DC4
> GPR24: C000BE90 02000000 00030002 C3E74064 C3E74064 C3681CE0 C36831A0
> C0247FE4
> Call backtrace:
> C3681CE0 C001F714 C001F7F0 C001F99C C000C020 C0003DC4 C350DCC0
> C0023068 C0023520 C0031FC0 C0003B5C 0FF31FD0 0FF307F0 0FF32040
> 0FF31EB4 0FF26DF0 0FF26C48 0FF2FDD0 0FE8E6D8 0FE8EB00 0FF6553C
> 0FF64EF4 10029794 10012B64 10012648 10012718 10012764 10012764
> 10012764 100133C8 10027B64 100277E0 0FEDC068
>
> closest to c001f714 :: c001f674 t do_anonymous_page     (plus 160 of
> 296)
> closest to c001f7f0 :: c001f79c t do_no_page     (plus 84 of 396)
> closest to c001f99c :: c001f928 T handle_mm_fault     (plus 116 of 300)
> closest to c000c020 :: c000be90 T do_page_fault     (plus 400 of 904)
> closest to c0003dc4 :: c0003dc4 T ret_from_except     (plus 0 of 40)
> closest to c350dcc0 :: NOT FOUND
> closest to c0023068 :: c0022e14 T do_generic_file_read     (plus 596 of
> 1552)
> closest to c0023520 :: c00234b8 T generic_file_read     (plus 104 of
> 160)
> closest to c0031fc0 :: c0031ef8 T sys_read     (plus 200 of 276)
> closest to c0003b5c :: c0003b5c T ret_from_syscall_1     (plus 0 of 180)
>
> --
> Adam Wozniak (KG6GZR)   COM DEV Wireless - Digital and Software Systems
> awozniak@comdev.cc      3450 Broad St. 107, San Luis Obispo, CA 93401
>                        http://www.comdev.cc
>                         Voice: (805) 544-1089       Fax: (805) 544-2055
>

--
===========================================================================
Cal Erickson                 MontaVista Software Inc.
Linux Consultant             1237 E. Arques Ave.
Phone (408) 328-0304         Sunnyvale CA 94085
Fax   (408) 328-9204         web http://www.mvista.com
eCode: http://cal@work.com.ecode.com
===========================================================================

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mmu problems
  2001-07-16 18:10       ` Cal Erickson
@ 2001-07-17 23:29         ` Adam Wozniak
  0 siblings, 0 replies; 7+ messages in thread
From: Adam Wozniak @ 2001-07-17 23:29 UTC (permalink / raw)
  To: cal_erickson; +Cc: Dan Malek, James F Dougherty, linuxppc-embedded


I'm still completely stuck.  Any help would be greatly appreciated.
Every crash in the last 2 days has been pretty much this:


Oops: Exception in kernel mode, sig: 4
NIP: C0005F80 XER: 00000000 LR: C000CF9C SP: C34B3BF0 REGS: c34b3b40
TRAP: 0700 MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c34b2000[20] 'ls' Last syscall: 3
last math c3472000 last altivec 00000000
GPR00: C001F714 C34B3BF0 C34B2000 C3400000 00000080 00000081 C33FF000
00000001
GPR08: 00000000 000000C9 C027B060 00000000 44444824 1004BE90 03FDE000
007FFF00
GPR16: 00000000 007FFF00 007FFEB0 C002343C 00009032 034B3D20 00000000
C0003DC4
GPR24: C000BE90 02000000 00030002 C34A6064 C34A6064 C35404A0 C3683320
C0247FE4
Call backtrace:
C35404A0 C001F714 C001F7F0 C001F99C C000C020 C0003DC4 C352FCC0
C0023080 C0023538 C0031FD4 C0003B5C 0FF31FD0 0FF307F0 0FF32040
0FF31EB4 0FF26DF0 0FF26C48 0FF2FDD0 0FE8E6D8 0FE8EB00 0FF6553C
0FF64EF4 10029794 10012B64 10012648 10012718 10012764 10012764
10012764 100133C8 10027B64 100277E0 0FEDC068

closest to c0005f80 :: c0005f50 T __flush_page_to_ram     (plus 48 of
76)
closest to c35404a0 :: NOT FOUND
closest to c001f714 :: c001f674 t do_anonymous_page     (plus 160 of
296)
closest to c001f7f0 :: c001f79c t do_no_page     (plus 84 of 396)
closest to c001f99c :: c001f928 T handle_mm_fault     (plus 116 of 300)
closest to c000c020 :: c000be90 T do_page_fault     (plus 400 of 904)
closest to c0003dc4 :: c0003dc4 T ret_from_except     (plus 0 of 40)
closest to c352fcc0 :: NOT FOUND
closest to c0023080 :: c0022e2c T do_generic_file_read     (plus 596 of
1552)
closest to c0023538 :: c00234d0 T generic_file_read     (plus 104 of
160)
closest to c0031fd4 :: c0031f0c T sys_read     (plus 200 of 276)
closest to c0003b5c :: c0003b5c T ret_from_syscall_1     (plus 0 of 180)

_GLOBAL(__flush_page_to_ram)
        mfspr   r5,PVR
        rlwinm  r5,r5,16,16,31
        cmpi    0,r5,1
        beqlr                           /* for 601, do nothing */
        rlwinm  r3,r3,0,0,19            /* Get page base address */
        li      r4,4096/CACHE_LINE_SIZE /* Number of lines in a page */
        mtctr   r4
        mr      r6,r3
0:      dcbst   0,r3                    /* Write line to ram */
        addi    r3,r3,CACHE_LINE_SIZE
        bdnz    0b
        sync
        mtctr   r4              /* !!!! this is __flush_page_to_ram +48
!!!! */
1:      icbi    0,r6
        addi    r6,r6,CACHE_LINE_SIZE
        bdnz    1b
        sync
        isync
        blr

--
Adam Wozniak (KG6GZR)   COM DEV Wireless - Digital and Software Systems
awozniak@comdev.cc      3450 Broad St. 107, San Luis Obispo, CA 93401
                        http://www.comdev.cc
                        Voice: (805) 544-1089       Fax: (805) 544-2055

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mmu problems
  2001-07-13 18:42 mmu problems Adam Wozniak
  2001-07-13 20:36 ` Dan Malek
@ 2001-07-27  6:53 ` Peter Ryser
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Ryser @ 2001-07-27  6:53 UTC (permalink / raw)
  To: linuxppc-embedded

Hi Adam,

not a solution for your real problem, but I hope it helps anyway.

> Is there any way I can get gcc to annotate the assembly listing with the
> C source
> code?  I feel it would help pinpoint what is going on...

objdump --source vmlinux

produces a mixed C/assembly file. Works only if debugging information has been
included in the kernel during compilation.

- Peter

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-07-27  6:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-07-13 18:42 mmu problems Adam Wozniak
2001-07-13 20:36 ` Dan Malek
2001-07-16 15:58   ` Adam Wozniak
2001-07-16 16:56     ` Adam Wozniak
2001-07-16 18:10       ` Cal Erickson
2001-07-17 23:29         ` Adam Wozniak
2001-07-27  6:53 ` Peter Ryser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).