From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Manfred Schwarb <manfred99@gmx.ch>
Cc: linux-kernel@vger.kernel.org, reiserfs-dev@namesys.com,
trond.myklebust@fys.uio.no, Hans Reiser <reiser@namesys.com>
Subject: Re: 2.4.29-pre2 Oops at find_inode/reiserfs_find_actor
Date: Tue, 28 Dec 2004 09:24:43 -0200 [thread overview]
Message-ID: <20041228112443.GA25253@logos.cnet> (raw)
In-Reply-To: <1742.1103673366@www47.gmx.net>
On Wed, Dec 22, 2004 at 12:56:06AM +0100, Manfred Schwarb wrote:
> > > >>EIP; c0153b09 <find_inode+19/70> <=====
> > >
> > > >>eax; e0b3c9e0 <[reiserfs]reiserfs_find_actor+0/40>
> > > >>edx; dff80000 <_end+1fbfd7f4/20792854>
> > > >>edi; dffa2a58 <_end+1fc2024c/20792854>
> > > >>esp; d7f05d60 <_end+17b83554/20792854>
> > >
> > > Trace; c0153f4e <iget4_locked+5e/110>
> > > Trace; e0b3c9e0 <[reiserfs]reiserfs_find_actor+0/40>
> > > Trace; e0b3ca60 <[reiserfs]reiserfs_iget+40/c0>
> > > Trace; e0b3c9e0 <[reiserfs]reiserfs_find_actor+0/40>
> > > Trace; e0b37b11 <[reiserfs]reiserfs_lookup+101/120>
> > > Trace; c0151d1c <d_alloc+1c/1d0>
> > > Trace; c01491cf <lookup_hash+9f/d0>
> > > Trace; c0149279 <lookup_one_len+79/90>
> > > Trace; e118fd71 <[nfsd]nfsd_lookup+d1/490>
> > > Trace; e1196d39 <[nfsd]nfsd3_proc_lookup+a9/140>
> > > Trace; e119de8c <[nfsd]nfsd_procedures3+6c/320>
> > > Trace; e118c68d <[nfsd]nfsd_dispatch+14d/220>
> > > Trace; c027ab3e <svc_process+3de/590>
> > > Trace; e119de8c <[nfsd]nfsd_procedures3+6c/320>
> > > Trace; e119d758 <[nfsd]nfsd_version3+0/10>
> > > Trace; e119d778 <[nfsd]nfsd_program+0/28>
> > > Trace; e118c3cb <[nfsd]nfsd+1bb/330>
> > > Trace; c010729b <arch_kernel_thread+2b/40>
> > > Trace; e118c210 <[nfsd]nfsd+0/330>
> > >
> > > Code; c0153b09 <find_inode+19/70>
> > > 00000000 <_EIP>:
> > > Code; c0153b09 <find_inode+19/70> <=====
> > > 0: 39 6b 28 cmp %ebp,0x28(%ebx) <=====
> > > Code; c0153b0c <find_inode+1c/70>
> > > 3: 89 de mov %ebx,%esi
> > > Code; c0153b0e <find_inode+1e/70>
> > > 5: 75 f1 jne fffffff8 <_EIP+0xfffffff8>
> > > Code; c0153b10 <find_inode+20/70>
> > > 7: 8b 44 24 20 mov 0x20(%esp,1),%eax
> > > Code; c0153b14 <find_inode+24/70>
> > > b: 39 83 a0 00 00 00 cmp %eax,0xa0(%ebx)
> > > Code; c0153b1a <find_inode+2a/70>
> > > 11: 75 e5 jne fffffff8 <_EIP+0xfffffff8>
> > > Code; c0153b1c <find_inode+2c/70>
> > > 13: 8b 00 mov (%eax),%eax
> >
> > This is indeed corruption - an inode in this hash bucket has "->next" as
> > NULL, so find_inode goes boom.
> >
> > Something is leaving this hash bucket list corrupt.
> >
> > Have you ever seen this crash before ? Can you reproduce it?
> >
>
> No, not at all. This machine was running for 10 months with 2.4.xx
> kernels without any problems. Since this oops, I tried to reproduce
> the particular situation (rsync over nfs, and some additional load),
> put I had no success in crashing the box:
> I mirrored the box (ca. 1 million files) 5 times, no problem.
>
>
> > The only inode corruption case that was reliable was from Chris Caputo
> > and we end up agreeing that it was most likely a hardware issue, because
> > it was
> > hard to reproduce and strange in several ways. Can you point me at
> > the "quite some similar reports" you have found, please ?
> >
>
> Just my first impression, a closer look showed that most of the
> cases group around 2.4.1[789] because of lacking reiserfs_find_actor.
> Sorry for the overstatement.
>
> A recent oops report shows some similarity, using a 2.6.7 kernel:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109278828905885&w=2
>
>
> Hardware issue: you mean memory? Last winter I ran memtest86
> during a weekend, everything was fine. At the moment I can't
> take this box offline for a longer period to test again, so I
> tend to belive memory is ok, and knock on wood...
Yes, what I'm saying is that no reliable inode corruption case has been
reported recently, except when hardware was flaky (usually memory errors).
I'm not saying that this is your case - its just one explanation to the
problem. It might well be a software problem.
Hans, did any of your developers see similar inode cache hashtable corruption
in v2.4.x?
prev parent reply other threads:[~2004-12-28 14:21 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-20 22:28 2.4.29-pre2 Oops at find_inode/reiserfs_find_actor Manfred Schwarb
2004-12-21 16:46 ` Marcelo Tosatti
2004-12-21 19:34 ` Trond Myklebust
2004-12-21 23:56 ` Manfred Schwarb
2004-12-22 0:59 ` Willy Tarreau
2004-12-21 23:56 ` Manfred Schwarb
2004-12-28 11:24 ` Marcelo Tosatti [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041228112443.GA25253@logos.cnet \
--to=marcelo.tosatti@cyclades.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred99@gmx.ch \
--cc=reiser@namesys.com \
--cc=reiserfs-dev@namesys.com \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox