public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Oops in prune_dcache (2.4.0-prerelease)
@ 2001-01-03  3:39 Udo A. Steinberg
  2001-01-03  3:54 ` Linus Torvalds
  2001-01-03  4:00 ` Alexander Viro
  0 siblings, 2 replies; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03  3:39 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel


Hi Linus et. all

While under massive disk and cpu load, 2.4.0-prerelease produced
the following oops (decode see below)

Keith, I've read the FAQ about having been bitten by Makefile bugs
with certain symbols and such, yet I still get these symbol warnings
even after a mrproper rebuild. Any clues?

-Udo.

ksymoops 2.3.5 on i686 2.4.0-prerelease.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.0-prerelease/ (default)
     -m /boot/System.map-2.4.0-prerelease (specified)

Warning (compare_maps): ksyms_base symbol acpi_clear_event_R__ver_acpi_clear_event not found in System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol acpi_cm_memcpy_R__ver_acpi_cm_memcpy not found in System.map.  Ignoring ksyms_base entry

[ ** many other warnings snipped to reduce spam ** ]

Unable to handle kernel paging request at virtual address 01000014
c01419cc
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01419cc>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 01000000   ebx: c20847e0   ecx: c2081d10   edx: c2081d10
esi: c20847c0   edi: c2081d00   ebp: 00002c79   esp: c147bfa4
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 3, stackpage=c147b000)
Stack: 00010f00 00000003 00000004 00000000 c0141cc1 00006ea6 c012a0d3 00000006 
       00000004 00010f00 c023e1f1 c147a239 0008e000 c012a19a 00000004 00000000 
       cffe5fbc c0105000 ffffff9c c01074d3 00000000 c02d6d64 c02a5fdc 
Call Trace: [<c0141cc1>] [<c012a0d3>] [<c023e1f1>] [<c012a19a>] [<c0105000>] [<c01074d3>] 
Code: 8b 40 14 85 c0 74 09 57 56 ff d0 83 c4 08 eb 0c 57 e8 be 1b 

>>EIP; c01419cc <prune_dcache+9c/120>   <=====
Trace; c0141cc1 <shrink_dcache_memory+21/30>
Trace; c012a0d3 <do_try_to_free_pages+53/90>
Trace; c023e1f1 <tvecs+2169/1b124>
Trace; c012a19a <kswapd+8a/140>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c01074d3 <kernel_thread+23/30>
Code;  c01419cc <prune_dcache+9c/120>
00000000 <_EIP>:
Code;  c01419cc <prune_dcache+9c/120>   <=====
   0:   8b 40 14                  movl   0x14(%eax),%eax   <=====
Code;  c01419cf <prune_dcache+9f/120>
   3:   85 c0                     testl  %eax,%eax
Code;  c01419d1 <prune_dcache+a1/120>
   5:   74 09                     je     10 <_EIP+0x10> c01419dc <prune_dcache+ac/120>
Code;  c01419d3 <prune_dcache+a3/120>
   7:   57                        pushl  %edi
Code;  c01419d4 <prune_dcache+a4/120>
   8:   56                        pushl  %esi
Code;  c01419d5 <prune_dcache+a5/120>
   9:   ff d0                     call   *%eax
Code;  c01419d7 <prune_dcache+a7/120>
   b:   83 c4 08                  addl   $0x8,%esp
Code;  c01419da <prune_dcache+aa/120>
   e:   eb 0c                     jmp    1c <_EIP+0x1c> c01419e8 <prune_dcache+b8/120>
Code;  c01419dc <prune_dcache+ac/120>
  10:   57                        pushl  %edi
Code;  c01419dd <prune_dcache+ad/120>
  11:   e8 be 1b 00 00            call   1bd4 <_EIP+0x1bd4> c01435a0 <iput+0/150>


45 warnings issued.  Results may not be reliable.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03  3:39 Udo A. Steinberg
@ 2001-01-03  3:54 ` Linus Torvalds
  2001-01-03  4:04   ` Udo A. Steinberg
  2001-01-03  8:29   ` Dan Aloni
  2001-01-03  4:00 ` Alexander Viro
  1 sibling, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2001-01-03  3:54 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Linux Kernel



On Wed, 3 Jan 2001, Udo A. Steinberg wrote:
> 
> While under massive disk and cpu load, 2.4.0-prerelease produced
> the following oops (decode see below)

Hmm.. If I'm not mistaken, this is in dentry_iput() (inline function
called by prune_one_dentry(), which is _also_ an inline function, which
is why it gets reported as being in prune_dcache):

	if (dentry->d_op && dentry->d_op->d_iput)
		dentry->d_op->d_iput(dentry, inode);

and it looks like your dentry->d_op has a value of 0x01000000, so when we
load the d_op->d_iput pointer, we get a page fault.

The strange thing is that 0x01000000 value, which almost certainly should
just be NULL. A one-bit error.

Now, I assume this machine has been historically stable, with no history
of memory corruption problems.. It's entirely possible (and likely) that
the one-bit error is due to some wild kernel pointer. Which makes this
_really_ hard to debug.

I'll try to think about it some more, but I'd love to have more reports to
go on to try to find a pattern..

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03  3:39 Udo A. Steinberg
  2001-01-03  3:54 ` Linus Torvalds
@ 2001-01-03  4:00 ` Alexander Viro
  1 sibling, 0 replies; 10+ messages in thread
From: Alexander Viro @ 2001-01-03  4:00 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Linus Torvalds, Linux Kernel

On Wed, 3 Jan 2001, Udo A. Steinberg wrote:

> 
> Hi Linus et. all
> 
> While under massive disk and cpu load, 2.4.0-prerelease produced
> the following oops (decode see below)
 
> Unable to handle kernel paging request at virtual address 01000014
 
> Code;  c01419cc <prune_dcache+9c/120>   <=====
>    0:   8b 40 14                  movl   0x14(%eax),%eax   <=====
> Code;  c01419cf <prune_dcache+9f/120>
>    3:   85 c0                     testl  %eax,%eax
> Code;  c01419d1 <prune_dcache+a1/120>
>    5:   74 09                     je     10 <_EIP+0x10> c01419dc <prune_dcache+ac/120>

dentry->d_op == 0x1000000 in dentry_iput(). 9:1 that you've got bit 24 flipped
(i.e. it was supposed to be NULL and you are seeing an effect of hardware
problem).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03  3:54 ` Linus Torvalds
@ 2001-01-03  4:04   ` Udo A. Steinberg
  2001-01-03  8:29   ` Dan Aloni
  1 sibling, 0 replies; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03  4:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel

Hi,

Linus Torvalds wrote:
> 
> The strange thing is that 0x01000000 value, which almost certainly should
> just be NULL. A one-bit error.
> 
> Now, I assume this machine has been historically stable, with no history
> of memory corruption problems.. It's entirely possible (and likely) that
> the one-bit error is due to some wild kernel pointer. Which makes this
> _really_ hard to debug.

Yes the machine is otherwise rock stable, not overclocked and memory timings
are rather conservative. Before the oops the machine had been compiling some
major application for like 5 hours and maybe the excessive stress kicked a
bit somewhere - who knows.

> I'll try to think about it some more, but I'd love to have more reports to
> go on to try to find a pattern..

That's one I can't reproduce. I've just run memtest86 over the entire ram
and it doesn't show any oddities - which doesn't really rule out an
occassional bit-flip due to neutrino storms though ;-)
If someone else has seen something similar lately, it's time to speak up.

-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03  3:54 ` Linus Torvalds
  2001-01-03  4:04   ` Udo A. Steinberg
@ 2001-01-03  8:29   ` Dan Aloni
  2001-01-03  9:29     ` Alexander Viro
  2001-01-03 11:18     ` Udo A. Steinberg
  1 sibling, 2 replies; 10+ messages in thread
From: Dan Aloni @ 2001-01-03  8:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Udo A. Steinberg, Linux Kernel

On Tue, 2 Jan 2001, Linus Torvalds wrote:

> On Wed, 3 Jan 2001, Udo A. Steinberg wrote:
> > 
> > While under massive disk and cpu load, 2.4.0-prerelease produced
> > the following oops (decode see below)

[..]

> Now, I assume this machine has been historically stable, with no history
> of memory corruption problems.. It's entirely possible (and likely) that
> the one-bit error is due to some wild kernel pointer. Which makes this
> _really_ hard to debug.

After a bit of few code reviewing, it looks like the only code that
assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c. 

Udo, are you using vfat?

-- 
Dan Aloni 
dax@karrde.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03  8:29   ` Dan Aloni
@ 2001-01-03  9:29     ` Alexander Viro
  2001-01-03 11:18     ` Udo A. Steinberg
  1 sibling, 0 replies; 10+ messages in thread
From: Alexander Viro @ 2001-01-03  9:29 UTC (permalink / raw)
  To: Dan Aloni; +Cc: Linus Torvalds, Udo A. Steinberg, Linux Kernel



On Wed, 3 Jan 2001, Dan Aloni wrote:

> After a bit of few code reviewing, it looks like the only code that
> assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c. 
> 
> Udo, are you using vfat?

	If it was assigned by something that was supposed to set ->d_op
it would not get such value. Whatever had done that had no idea of the
->d_op or struct dentry in the first place.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03  8:29   ` Dan Aloni
  2001-01-03  9:29     ` Alexander Viro
@ 2001-01-03 11:18     ` Udo A. Steinberg
  2001-01-03 12:00       ` Alexander Viro
  1 sibling, 1 reply; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03 11:18 UTC (permalink / raw)
  To: Dan Aloni; +Cc: Linus Torvalds, Linux Kernel

Dan Aloni wrote:
> 
> After a bit of few code reviewing, it looks like the only code that
> assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c.
> 
> Udo, are you using vfat?

Yes.

-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03 11:18     ` Udo A. Steinberg
@ 2001-01-03 12:00       ` Alexander Viro
  2001-01-03 12:08         ` Udo A. Steinberg
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Viro @ 2001-01-03 12:00 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Dan Aloni, Linus Torvalds, Linux Kernel



On Wed, 3 Jan 2001, Udo A. Steinberg wrote:

> Dan Aloni wrote:
> > 
> > After a bit of few code reviewing, it looks like the only code that
> > assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c.
> > 
> > Udo, are you using vfat?
> 
> Yes.

In principle, it might be that d_find_alias() is broken. I don't see where
it could happen, but then I'm half-asleep right now...  While we are at it,
do you have
	* autofs
	* knfsd
	* ncpfs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
  2001-01-03 12:00       ` Alexander Viro
@ 2001-01-03 12:08         ` Udo A. Steinberg
  0 siblings, 0 replies; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03 12:08 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Dan Aloni, Linus Torvalds, Linux Kernel

Hi,

Alexander Viro wrote:
>
> In principle, it might be that d_find_alias() is broken. I don't see where
> it could happen, but then I'm half-asleep right now...  While we are at it,
> do you have

>         * autofs

Yes.

>         * knfsd
>         * ncpfs

No, neither of these two.

-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in prune_dcache (2.4.0-prerelease)
@ 2001-01-03 18:27 Petr Vandrovec
  0 siblings, 0 replies; 10+ messages in thread
From: Petr Vandrovec @ 2001-01-03 18:27 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Dan Aloni, Linus Torvalds, Linux Kernel, viro

On  3 Jan 01 at 13:08, Udo A. Steinberg wrote:
> Alexander Viro wrote:
> >
> > In principle, it might be that d_find_alias() is broken. I don't see where
> > it could happen, but then I'm half-asleep right now...  While we are at it,
> > do you have
> 
> >         * autofs
> 
> Yes.
> 
> >         * knfsd
> >         * ncpfs
> 
> No, neither of these two.

I saw oopses in prune_dcache() during umount() of ncpfs circa 6 months
ago. As I was never able to reproduce problem, and it just stopped from
happenning as unexpected as it appeared, I never reported that. And
~2 times I got endless loop in d_prune_aliases() where it somewhat
happened that d_alias list looked like

1 -> 2 -> 3 -> 4 -> 2 -> 3 -> 4 ... (maybe after pruning d_count = 0
                                    entries...)

so it never stopped :-( But it really happened long long ago, I think
that sometime June-September 2000, and couple of logic changed since
then in both ncpfs and vfs.
                                    Best regards,
                                            Petr Vandrovec
                                            vandrove@vc.cvut.cz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-01-03 18:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-03 18:27 Oops in prune_dcache (2.4.0-prerelease) Petr Vandrovec
  -- strict thread matches above, loose matches on Subject: below --
2001-01-03  3:39 Udo A. Steinberg
2001-01-03  3:54 ` Linus Torvalds
2001-01-03  4:04   ` Udo A. Steinberg
2001-01-03  8:29   ` Dan Aloni
2001-01-03  9:29     ` Alexander Viro
2001-01-03 11:18     ` Udo A. Steinberg
2001-01-03 12:00       ` Alexander Viro
2001-01-03 12:08         ` Udo A. Steinberg
2001-01-03  4:00 ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox