Crash on MPC855T with 2.2.14

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Crash on MPC855T with 2.2.14
@ 2004-05-26 22:09 Marcelo Tosatti
  2004-05-26 22:20 ` Help with crash " Marcelo Tosatti
  0 siblings, 1 reply; 2+ messages in thread
From: Marcelo Tosatti @ 2004-05-26 22:09 UTC (permalink / raw)
  To: linuxppc-embedded; +Cc: Nei A. Chiaradia, Edson Seabra

Hi PPC fellows,

We are facing a crash on high load on our TS console servers (2.2.14 based).

The test used to reproduce the crash involves running SSH connection
attemps in a loop from a fast host. After one or two hours of testing,
the crash happens. Its still possible to ping the box and it answers to
typed keys, but thats all. The kernel is looping in page fault handling
code as following, which has been observed from a BDI2000 and gdb:

(gdb) cont
Continuing.

(locked here, so I type "ctrl+c" on the gdb session).

Program received signal SIGSTOP, Stopped (signal).
local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
549             asm volatile ("tlbia" : : );
(gdb) bt
#0  local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
#1  0xc0019368 in handle_mm_fault (tsk=0xce95e000, vma=0xce678200,
    address=2147481140, write_access=33554432) at memory.c:918
Cannot access memory at address 0xce95fca0
(gdb) cont
Continuing.

And it keeps receiving faults from this address (7FFFF634 in this example,
sometimes also 7FFFF630), which are part of the process last VMA. Forever.

# cat /proc/1/maps

30023000-30026000 rwxp 00013000 01:00 249        /lib/ld-2.1.3.so
30026000-30027000 rwxp 00000000 00:00 0
7fffe000-80000000 rwxp fffff000 00:00 0

The "error_code" passed to "do_page_fault" under such endless loop
is either 0xE (14) or 0x82000000 (2181038080).

handle_mm_fault trace for such "unsuccessful pte bringup":

#0  handle_mm_fault (tsk=0xce70c000, vma=0xce6188c0, address=2147481140,
    write_access=33554432) at memory.c:901

903             if (!pte_present(entry)) {
909             entry = pte_mkyoung(entry);
910             set_pte(pte, entry);
911             flush_tlb_page(vma, address);
912             if (write_access) {
913                     if (!pte_write(entry))
303             pte_val(pte) |= _PAGE_DIRTY;
304             if (pte_val(pte) & _PAGE_RW)
305                     pte_val(pte) |= _PAGE_HWWRITE;
918                     flush_tlb_page(vma, address);
916                     entry = pte_mkdirty(entry);
917                     set_pte(pte, entry);
918                     flush_tlb_page(vma, address);
921             return 1;

I should try to figure out why is it faulting. Maybe the pte
is not being correctly setup.

Any hints are welcome.

/proc/cpuinfo
processor       : 0
cpu             : 8xx
clock           : 48MHz
clock           : 48MHz
bus clock       : 48MHz
revision        : 0.0
bogomips        : 47.82
zero pages      : total 0 (0Kb) current: 0 (0Kb) hits: 0/124087 (0%)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Help with crash on MPC855T with 2.2.14
  2004-05-26 22:09 Crash on MPC855T with 2.2.14 Marcelo Tosatti
@ 2004-05-26 22:20 ` Marcelo Tosatti
  0 siblings, 0 replies; 2+ messages in thread
From: Marcelo Tosatti @ 2004-05-26 22:20 UTC (permalink / raw)
  To: linuxppc-embedded; +Cc: Nei A. Chiaradia, Edson Seabra


Forgot to mention that same processor (on a similar but not exactly the
same hardware) running v2.4 is not-crashable with the same test.

On Wed, May 26, 2004 at 07:09:54PM -0300, Marcelo Tosatti wrote:
>
> Hi PPC fellows,
>
> We are facing a crash on high load on our TS console servers (2.2.14 based).
>
> The test used to reproduce the crash involves running SSH connection attemps in a loop
> from a fast host. After one or two hours of testing, the crash happens. Its still
> possible to ping the box and it answers to typed keys, but thats all. The kernel is looping
> in page fault handling code as following, which has been observed from a BDI2000 and gdb:
>
> (gdb) cont
> Continuing.
>
> (locked here, so I type "ctrl+c" on the gdb session).
>
> Program received signal SIGSTOP, Stopped (signal).
> local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
> 549             asm volatile ("tlbia" : : );
> (gdb) bt
> #0  local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
> #1  0xc0019368 in handle_mm_fault (tsk=0xce95e000, vma=0xce678200,
>     address=2147481140, write_access=33554432) at memory.c:918
> Cannot access memory at address 0xce95fca0
> (gdb) cont
> Continuing.
>
> And it keeps receiving faults from this address (7FFFF634 in this example,
> sometimes also 7FFFF630), which are part of the process last VMA. Forever.
>
> # cat /proc/1/maps
>
> 30023000-30026000 rwxp 00013000 01:00 249        /lib/ld-2.1.3.so
> 30026000-30027000 rwxp 00000000 00:00 0
> 7fffe000-80000000 rwxp fffff000 00:00 0
>
> The "error_code" passed to "do_page_fault" under such endless loop
> is either 0xE (14) or 0x82000000 (2181038080).
>
> handle_mm_fault trace for such "unsuccessful pte bringup":
>
> #0  handle_mm_fault (tsk=0xce70c000, vma=0xce6188c0, address=2147481140,
>     write_access=33554432) at memory.c:901
>
> 903             if (!pte_present(entry)) {
> 909             entry = pte_mkyoung(entry);
> 910             set_pte(pte, entry);
> 911             flush_tlb_page(vma, address);
> 912             if (write_access) {
> 913                     if (!pte_write(entry))
> 303             pte_val(pte) |= _PAGE_DIRTY;
> 304             if (pte_val(pte) & _PAGE_RW)
> 305                     pte_val(pte) |= _PAGE_HWWRITE;
> 918                     flush_tlb_page(vma, address);
> 916                     entry = pte_mkdirty(entry);
> 917                     set_pte(pte, entry);
> 918                     flush_tlb_page(vma, address);
> 921             return 1;
>
> I should try to figure out why is it faulting. Maybe the pte
> is not being correctly setup.
>
> Any hints are welcome.
>
> /proc/cpuinfo
> processor       : 0
> cpu             : 8xx
> clock           : 48MHz
> clock           : 48MHz
> bus clock       : 48MHz
> revision        : 0.0
> bogomips        : 47.82
> zero pages      : total 0 (0Kb) current: 0 (0Kb) hits: 0/124087 (0%)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-05-26 22:20 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-26 22:09 Crash on MPC855T with 2.2.14 Marcelo Tosatti
2004-05-26 22:20 ` Help with crash " Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).