Re: 2.6.17-rc2-mm1 - Andrew Morton

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@osdl.org>
To: Martin Bligh <mbligh@google.com>
Cc: linuxppc64-dev@ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: 2.6.17-rc2-mm1
Date: Fri, 28 Apr 2006 01:20:22 -0700	[thread overview]
Message-ID: <20060428012022.7b73c77b.akpm@osdl.org> (raw)
In-Reply-To: <4450F5AD.9030200@google.com>


(I did s/linux-kernel@google.com/linux-kernel@vger.kernel.org/)

Martin Bligh <mbligh@google.com> wrote:
>
> Still crashes in LTP on x86_64:
> (introduced in previous release)
> 
> http://test.kernel.org/abat/29674/debug/console.log

What a mess.  A doublefault inside an NMI watchdog timeout.  I think.  It's
hard to see.  Some CPUs are stuck on a CPU scheduler lock, others seem to
be stuck in flush_tlb_others.  One of these could be a consequence of the
other, or both could be a consequence of something else.

> Different panic on 2-way ppp64  blade, again during LTP.
> 
> http://test.kernel.org/abat/29675/debug/console.log
> 
>   Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=128 NUMA
> Modules linked in: evdev joydev st sr_mod ipv6 usbcore sg dm_mod
> NIP: C000000000048F0C LR: C0000000000AF854 CTR: 800000000000A984
> REGS: c0000000074af560 TRAP: 0300   Not tainted  (2.6.17-rc2-mm1-autokern1)
> MSR: 8000000000001032 <ME,IR,DR>  CR: 24002024  XER: 00000010
> DAR: C00001800056B0B0, DSISR: 0000000040010000
> TASK = c000000007460800[84] 'kswapd0' THREAD: c0000000074ac000 CPU: 1
> GPR00: 8000000000001032 C0000000074AF7E0 C000000000691420 C0000000007586A8
> GPR04: 000000000000000F 0000000000000000 0000000000000000 0000000000000000
> GPR08: C0000000FE80AAD8 C00001800056B080 0000000000000001 C0000000007586A8
> GPR12: 0000000024002024 C00000000056B280 0000000000000020 0000000000000020
> GPR16: 0000000000000020 0000000000000000 0000000000000000 000000000000000F
> GPR20: C0000000074AF860 0000000000000000 C0000000FFFF3098 0000000000000001
> GPR24: C0000000074AFE00 C00000000059FCC0 0000000000000001 C0000000007586A8
> GPR28: C000000000545680 0000000000000022 C0000000005A4DA8 C00000000056B080
> NIP [C000000000048F0C] .try_to_wake_up+0x98/0x598
> LR [C0000000000AF854] .add_to_swapped_list+0x23c/0x264
> Call Trace:
> [C0000000074AF7E0] [C0000000005A4DA8] 0xc0000000005a4da8 (unreliable)
> [C0000000074AF8F0] [C0000000000AF854] .add_to_swapped_list+0x23c/0x264
> [C0000000074AF990] [C000000000098290] .remove_mapping+0x88/0x174
> [C0000000074AFA20] [C000000000099340] .shrink_zone+0xc74/0xf9c
> [C0000000074AFD30] [C00000000009A008] .kswapd+0x3e4/0x54c
> [C0000000074AFED0] [C0000000000705C8] .kthread+0x174/0x1c4
> [C0000000074AFF90] [C000000000024AB0] .kernel_thread+0x4c/0x68
> Instruction dump:
> 3a810080 7d2000a6 79208042 f9340000 78008000 7c010164 e97b0008 ebfe8008
> eb9e8000 812b0010 79294da4 7d29fa14 <e8090030> 7fbc0214 7fa3eb78 4841f615
> -- 0:conmux-control -- time-stamp -- Apr/27/06  5:10:48 --

Well that's silly.  kswapd died trying to wake up kprefetchd.  That code's
bog-simple, so I'd assume something's gone wrong with a CPU scheduler data
structure.  So if there's a common strand here, it's breakage of sched data
structures by mtest01.   Let me see if I can provoke it here.

next prev parent reply	other threads:[~2006-04-28  8:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-27 16:47 2.6.17-rc2-mm1 Martin Bligh
2006-04-28  8:20 ` Andrew Morton [this message]
2006-05-01 14:24   ` 2.6.17-rc2-mm1 Martin J. Bligh
2006-05-01 17:07     ` 2.6.17-rc2-mm1 Andrew Morton
2006-05-01 17:14       ` 2.6.17-rc2-mm1 Martin Bligh
2006-05-01 17:19       ` 2.6.17-rc2-mm1 Badari Pulavarty
2006-05-01 17:26         ` 2.6.17-rc2-mm1 Martin Bligh
2006-05-01 17:55           ` 2.6.17-rc2-mm1 Badari Pulavarty
2006-05-01 17:57             ` 2.6.17-rc2-mm1 Martin Bligh
2006-05-01 18:32               ` 2.6.17-rc2-mm1 Andy Whitcroft
2006-05-01 23:29                 ` 2.6.17-rc2-mm1 Badari Pulavarty
2006-05-01 18:34     ` 2.6.17-rc2-mm1 Andi Kleen
2006-05-02 13:20       ` 2.6.17-rc2-mm1 Andy Whitcroft
  -- strict thread matches above, loose matches on Subject: below --
2006-04-27 16:50 2.6.17-rc2-mm1 Martin Bligh
2006-04-27 16:54 2.6.17-rc2-mm1 Martin Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060428012022.7b73c77b.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc64-dev@ozlabs.org \
    --cc=mbligh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).