* Oops in prune_dcache (2.4.0-prerelease)
@ 2001-01-03 3:39 Udo A. Steinberg
2001-01-03 3:54 ` Linus Torvalds
2001-01-03 4:00 ` Alexander Viro
0 siblings, 2 replies; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03 3:39 UTC (permalink / raw)
To: Linus Torvalds, Linux Kernel
Hi Linus et. all
While under massive disk and cpu load, 2.4.0-prerelease produced
the following oops (decode see below)
Keith, I've read the FAQ about having been bitten by Makefile bugs
with certain symbols and such, yet I still get these symbol warnings
even after a mrproper rebuild. Any clues?
-Udo.
ksymoops 2.3.5 on i686 2.4.0-prerelease. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.0-prerelease/ (default)
-m /boot/System.map-2.4.0-prerelease (specified)
Warning (compare_maps): ksyms_base symbol acpi_clear_event_R__ver_acpi_clear_event not found in System.map. Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol acpi_cm_memcpy_R__ver_acpi_cm_memcpy not found in System.map. Ignoring ksyms_base entry
[ ** many other warnings snipped to reduce spam ** ]
Unable to handle kernel paging request at virtual address 01000014
c01419cc
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01419cc>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 01000000 ebx: c20847e0 ecx: c2081d10 edx: c2081d10
esi: c20847c0 edi: c2081d00 ebp: 00002c79 esp: c147bfa4
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 3, stackpage=c147b000)
Stack: 00010f00 00000003 00000004 00000000 c0141cc1 00006ea6 c012a0d3 00000006
00000004 00010f00 c023e1f1 c147a239 0008e000 c012a19a 00000004 00000000
cffe5fbc c0105000 ffffff9c c01074d3 00000000 c02d6d64 c02a5fdc
Call Trace: [<c0141cc1>] [<c012a0d3>] [<c023e1f1>] [<c012a19a>] [<c0105000>] [<c01074d3>]
Code: 8b 40 14 85 c0 74 09 57 56 ff d0 83 c4 08 eb 0c 57 e8 be 1b
>>EIP; c01419cc <prune_dcache+9c/120> <=====
Trace; c0141cc1 <shrink_dcache_memory+21/30>
Trace; c012a0d3 <do_try_to_free_pages+53/90>
Trace; c023e1f1 <tvecs+2169/1b124>
Trace; c012a19a <kswapd+8a/140>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c01074d3 <kernel_thread+23/30>
Code; c01419cc <prune_dcache+9c/120>
00000000 <_EIP>:
Code; c01419cc <prune_dcache+9c/120> <=====
0: 8b 40 14 movl 0x14(%eax),%eax <=====
Code; c01419cf <prune_dcache+9f/120>
3: 85 c0 testl %eax,%eax
Code; c01419d1 <prune_dcache+a1/120>
5: 74 09 je 10 <_EIP+0x10> c01419dc <prune_dcache+ac/120>
Code; c01419d3 <prune_dcache+a3/120>
7: 57 pushl %edi
Code; c01419d4 <prune_dcache+a4/120>
8: 56 pushl %esi
Code; c01419d5 <prune_dcache+a5/120>
9: ff d0 call *%eax
Code; c01419d7 <prune_dcache+a7/120>
b: 83 c4 08 addl $0x8,%esp
Code; c01419da <prune_dcache+aa/120>
e: eb 0c jmp 1c <_EIP+0x1c> c01419e8 <prune_dcache+b8/120>
Code; c01419dc <prune_dcache+ac/120>
10: 57 pushl %edi
Code; c01419dd <prune_dcache+ad/120>
11: e8 be 1b 00 00 call 1bd4 <_EIP+0x1bd4> c01435a0 <iput+0/150>
45 warnings issued. Results may not be reliable.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 3:39 Udo A. Steinberg
@ 2001-01-03 3:54 ` Linus Torvalds
2001-01-03 4:04 ` Udo A. Steinberg
2001-01-03 8:29 ` Dan Aloni
2001-01-03 4:00 ` Alexander Viro
1 sibling, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2001-01-03 3:54 UTC (permalink / raw)
To: Udo A. Steinberg; +Cc: Linux Kernel
On Wed, 3 Jan 2001, Udo A. Steinberg wrote:
>
> While under massive disk and cpu load, 2.4.0-prerelease produced
> the following oops (decode see below)
Hmm.. If I'm not mistaken, this is in dentry_iput() (inline function
called by prune_one_dentry(), which is _also_ an inline function, which
is why it gets reported as being in prune_dcache):
if (dentry->d_op && dentry->d_op->d_iput)
dentry->d_op->d_iput(dentry, inode);
and it looks like your dentry->d_op has a value of 0x01000000, so when we
load the d_op->d_iput pointer, we get a page fault.
The strange thing is that 0x01000000 value, which almost certainly should
just be NULL. A one-bit error.
Now, I assume this machine has been historically stable, with no history
of memory corruption problems.. It's entirely possible (and likely) that
the one-bit error is due to some wild kernel pointer. Which makes this
_really_ hard to debug.
I'll try to think about it some more, but I'd love to have more reports to
go on to try to find a pattern..
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 3:39 Udo A. Steinberg
2001-01-03 3:54 ` Linus Torvalds
@ 2001-01-03 4:00 ` Alexander Viro
1 sibling, 0 replies; 10+ messages in thread
From: Alexander Viro @ 2001-01-03 4:00 UTC (permalink / raw)
To: Udo A. Steinberg; +Cc: Linus Torvalds, Linux Kernel
On Wed, 3 Jan 2001, Udo A. Steinberg wrote:
>
> Hi Linus et. all
>
> While under massive disk and cpu load, 2.4.0-prerelease produced
> the following oops (decode see below)
> Unable to handle kernel paging request at virtual address 01000014
> Code; c01419cc <prune_dcache+9c/120> <=====
> 0: 8b 40 14 movl 0x14(%eax),%eax <=====
> Code; c01419cf <prune_dcache+9f/120>
> 3: 85 c0 testl %eax,%eax
> Code; c01419d1 <prune_dcache+a1/120>
> 5: 74 09 je 10 <_EIP+0x10> c01419dc <prune_dcache+ac/120>
dentry->d_op == 0x1000000 in dentry_iput(). 9:1 that you've got bit 24 flipped
(i.e. it was supposed to be NULL and you are seeing an effect of hardware
problem).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 3:54 ` Linus Torvalds
@ 2001-01-03 4:04 ` Udo A. Steinberg
2001-01-03 8:29 ` Dan Aloni
1 sibling, 0 replies; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03 4:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux Kernel
Hi,
Linus Torvalds wrote:
>
> The strange thing is that 0x01000000 value, which almost certainly should
> just be NULL. A one-bit error.
>
> Now, I assume this machine has been historically stable, with no history
> of memory corruption problems.. It's entirely possible (and likely) that
> the one-bit error is due to some wild kernel pointer. Which makes this
> _really_ hard to debug.
Yes the machine is otherwise rock stable, not overclocked and memory timings
are rather conservative. Before the oops the machine had been compiling some
major application for like 5 hours and maybe the excessive stress kicked a
bit somewhere - who knows.
> I'll try to think about it some more, but I'd love to have more reports to
> go on to try to find a pattern..
That's one I can't reproduce. I've just run memtest86 over the entire ram
and it doesn't show any oddities - which doesn't really rule out an
occassional bit-flip due to neutrino storms though ;-)
If someone else has seen something similar lately, it's time to speak up.
-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 3:54 ` Linus Torvalds
2001-01-03 4:04 ` Udo A. Steinberg
@ 2001-01-03 8:29 ` Dan Aloni
2001-01-03 9:29 ` Alexander Viro
2001-01-03 11:18 ` Udo A. Steinberg
1 sibling, 2 replies; 10+ messages in thread
From: Dan Aloni @ 2001-01-03 8:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Udo A. Steinberg, Linux Kernel
On Tue, 2 Jan 2001, Linus Torvalds wrote:
> On Wed, 3 Jan 2001, Udo A. Steinberg wrote:
> >
> > While under massive disk and cpu load, 2.4.0-prerelease produced
> > the following oops (decode see below)
[..]
> Now, I assume this machine has been historically stable, with no history
> of memory corruption problems.. It's entirely possible (and likely) that
> the one-bit error is due to some wild kernel pointer. Which makes this
> _really_ hard to debug.
After a bit of few code reviewing, it looks like the only code that
assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c.
Udo, are you using vfat?
--
Dan Aloni
dax@karrde.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 8:29 ` Dan Aloni
@ 2001-01-03 9:29 ` Alexander Viro
2001-01-03 11:18 ` Udo A. Steinberg
1 sibling, 0 replies; 10+ messages in thread
From: Alexander Viro @ 2001-01-03 9:29 UTC (permalink / raw)
To: Dan Aloni; +Cc: Linus Torvalds, Udo A. Steinberg, Linux Kernel
On Wed, 3 Jan 2001, Dan Aloni wrote:
> After a bit of few code reviewing, it looks like the only code that
> assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c.
>
> Udo, are you using vfat?
If it was assigned by something that was supposed to set ->d_op
it would not get such value. Whatever had done that had no idea of the
->d_op or struct dentry in the first place.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 8:29 ` Dan Aloni
2001-01-03 9:29 ` Alexander Viro
@ 2001-01-03 11:18 ` Udo A. Steinberg
2001-01-03 12:00 ` Alexander Viro
1 sibling, 1 reply; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03 11:18 UTC (permalink / raw)
To: Dan Aloni; +Cc: Linus Torvalds, Linux Kernel
Dan Aloni wrote:
>
> After a bit of few code reviewing, it looks like the only code that
> assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c.
>
> Udo, are you using vfat?
Yes.
-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 11:18 ` Udo A. Steinberg
@ 2001-01-03 12:00 ` Alexander Viro
2001-01-03 12:08 ` Udo A. Steinberg
0 siblings, 1 reply; 10+ messages in thread
From: Alexander Viro @ 2001-01-03 12:00 UTC (permalink / raw)
To: Udo A. Steinberg; +Cc: Dan Aloni, Linus Torvalds, Linux Kernel
On Wed, 3 Jan 2001, Udo A. Steinberg wrote:
> Dan Aloni wrote:
> >
> > After a bit of few code reviewing, it looks like the only code that
> > assigns stuff to ->d_op in a nonstandard way is in fs/vfat/namei.c.
> >
> > Udo, are you using vfat?
>
> Yes.
In principle, it might be that d_find_alias() is broken. I don't see where
it could happen, but then I'm half-asleep right now... While we are at it,
do you have
* autofs
* knfsd
* ncpfs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
2001-01-03 12:00 ` Alexander Viro
@ 2001-01-03 12:08 ` Udo A. Steinberg
0 siblings, 0 replies; 10+ messages in thread
From: Udo A. Steinberg @ 2001-01-03 12:08 UTC (permalink / raw)
To: Alexander Viro; +Cc: Dan Aloni, Linus Torvalds, Linux Kernel
Hi,
Alexander Viro wrote:
>
> In principle, it might be that d_find_alias() is broken. I don't see where
> it could happen, but then I'm half-asleep right now... While we are at it,
> do you have
> * autofs
Yes.
> * knfsd
> * ncpfs
No, neither of these two.
-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Oops in prune_dcache (2.4.0-prerelease)
@ 2001-01-03 18:27 Petr Vandrovec
0 siblings, 0 replies; 10+ messages in thread
From: Petr Vandrovec @ 2001-01-03 18:27 UTC (permalink / raw)
To: Udo A. Steinberg; +Cc: Dan Aloni, Linus Torvalds, Linux Kernel, viro
On 3 Jan 01 at 13:08, Udo A. Steinberg wrote:
> Alexander Viro wrote:
> >
> > In principle, it might be that d_find_alias() is broken. I don't see where
> > it could happen, but then I'm half-asleep right now... While we are at it,
> > do you have
>
> > * autofs
>
> Yes.
>
> > * knfsd
> > * ncpfs
>
> No, neither of these two.
I saw oopses in prune_dcache() during umount() of ncpfs circa 6 months
ago. As I was never able to reproduce problem, and it just stopped from
happenning as unexpected as it appeared, I never reported that. And
~2 times I got endless loop in d_prune_aliases() where it somewhat
happened that d_alias list looked like
1 -> 2 -> 3 -> 4 -> 2 -> 3 -> 4 ... (maybe after pruning d_count = 0
entries...)
so it never stopped :-( But it really happened long long ago, I think
that sometime June-September 2000, and couple of logic changed since
then in both ncpfs and vfs.
Best regards,
Petr Vandrovec
vandrove@vc.cvut.cz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2001-01-03 18:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-03 18:27 Oops in prune_dcache (2.4.0-prerelease) Petr Vandrovec
-- strict thread matches above, loose matches on Subject: below --
2001-01-03 3:39 Udo A. Steinberg
2001-01-03 3:54 ` Linus Torvalds
2001-01-03 4:04 ` Udo A. Steinberg
2001-01-03 8:29 ` Dan Aloni
2001-01-03 9:29 ` Alexander Viro
2001-01-03 11:18 ` Udo A. Steinberg
2001-01-03 12:00 ` Alexander Viro
2001-01-03 12:08 ` Udo A. Steinberg
2001-01-03 4:00 ` Alexander Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox