All of lore.kernel.org
 help / color / mirror / Atom feed
From: William Lee Irwin III <wli@holomorphy.com>
To: "Todd R. Eigenschink" <todd@tekinteractive.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0)
Date: Mon, 20 May 2002 15:36:13 -0700	[thread overview]
Message-ID: <20020520223613.GD2046@holomorphy.com> (raw)
In-Reply-To: <200205160528.g4G5S631019167@sol.mixi.net> <15587.42492.25950.446607@rtfm.ofc.tekinteractive.com> <15592.62193.715212.569689@rtfm.ofc.tekinteractive.com> <20020520170059.GA2046@holomorphy.com> <15593.23568.756199.612888@rtfm.ofc.tekinteractive.com>

On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote:
> Someone else had suggested testing the memory and power supply.
> memtest86 is easy to run, so I'll try that.  It'll have to be tonight,
> now.

Bitflips are usually things where a pointer turns up invalid (or
non-NULL) and the difference between it and a valid pointer (or NULL)
is one bit. I don't see that here and don't like blaming hardware.


On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote:
> Well, after my posting from earlier today, I recompiled the kernel
> after stripping some more stuff.  I just induced an oops in that one,
> so I can list the call stack for it.

Nice, I presume you've got -g there? Any chance of doing something like
objdump --disassemble --source vmlinux and sending me the annotated
disassembly of __wake_up()? I want to doublecheck something...


On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote:
> No IDE stuff this time; this looks a lot like most of the other ones
> I've seen.  This morning was the first time I've ever seen IDE stuff
> in the post-oops call stack.

This is pretty strange, yes.


On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote:
> It seems I can pretty much induce them at will, now.  I started up
> four simultaneous Webtrends sessions, which grow fairly quickly to
> 400-600 MB each, give or take.  (The machine has 2 GB of RAM, so it
> only swaps a little, sometimes.)  Within half an hour, it fell over.
> Here's the oops itself, then the gdb output.

Great stuff! Thanks.


On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote:
> Oops: 0000
> CPU:    1
> EIP:    0010:[<c0116363>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010087
> eax: c2802db4   ebx: c2002db4   ecx: 00000000   edx: 00000003
> esi: c2802db0   edi: c2802db0   ebp: f7bf3ee8   esp: f7bf3ecc
> ds: 0018   es: 0018   ss: 0018

Okay, %ecx is 0 -- no bitflip, just plain old NULL...


On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote:
> Code;  c0116363 <__wake_up+3b/c0>
> 00000000 <_EIP>:
> Code;  c0116363 <__wake_up+3b/c0>   <=====
>    0:   8b 01                     mov    (%ecx),%eax   <=====
> Code;  c0116365 <__wake_up+3d/c0>
>    2:   85 45 fc                  test   %eax,0xfffffffc(%ebp)
> Code;  c0116368 <__wake_up+40/c0>
>    5:   74 66                     je     6d <_EIP+0x6d> c01163d0 <__wake_up+a8/c

Okay, the offending instruction is mov (%ecx), %eax -- dereferencing the
NULL %ecx...


On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote:
> (gdb) list *__wake_up+0x3b
> 0x973 is in __wake_up (sched.c:731).
> 726                     unsigned int state;
> 727                     wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list);
> 728
> 729                     CHECK_MAGIC(curr->__magic);
> 730                     p = curr->task;
> 731                     state = p->state;
> 732                     if (state & mode) {
> 733                             WQ_NOTE_WAKER(curr);
> 734                             if (try_to_wake_up(p, sync) && (curr->flags&WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
> 735                                     break;


This makes it pretty clear the offending instruction corresponds to the
first dereference of curr->task. Someone's leaving a NULL pointer in
there when they shouldn't. So this entire call chain has nothing to do
with the offender -- it only trips over the bad pointer the offending
code left behind. This looks like a PITA. The objdump --disassemble
--source stuff is just to have the assembly and source next to each
other for a "more convincing" demonstration, not that this isn't already
pretty good as it stands. Of course, finding the offender will be painful.


Cheers,
Bill

  reply	other threads:[~2002-05-20 22:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200205160528.g4G5S631019167@sol.mixi.net>
2002-05-16 12:28 ` Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) Todd R. Eigenschink
2002-05-16 19:38   ` William Lee Irwin III
2002-05-20 12:58   ` Todd R. Eigenschink
2002-05-20 17:00     ` William Lee Irwin III
2002-05-20 20:26       ` Todd R. Eigenschink
2002-05-20 22:36         ` William Lee Irwin III [this message]
2002-05-20 23:07           ` Todd R. Eigenschink
2002-05-20 23:28             ` William Lee Irwin III
2002-05-20 23:59               ` Todd R. Eigenschink

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020520223613.GD2046@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=todd@tekinteractive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.