linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sylvain Rochet <gradator@gradator.net>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption
Date: Tue, 28 Jul 2009 13:27:15 +0200	[thread overview]
Message-ID: <20090728112715.GA8442@gradator.net> (raw)
In-Reply-To: <20090727154253.GB8332@duck.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 6024 bytes --]

Hi,


On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > 
> > > Can you still see the corruption with 2.6.30 kernel?
> > 
> > Not upgraded yet, we'll give a try.

Done, now featuring 2.6.30.3 ;)


> > > If you can still see this problem, could you run: debugfs /dev/md10
> > > and send output of the command:
> > > stat <40420228>
> > > (or whatever the corrupted inode number will be)
> > > and also:
> > > dump <40420228> /tmp/corrupted_dir
> > 
> > One inode get corrupted recently, here is the output:
> > 
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> > total 64
> > 88539836 drwxr-sr-x  2 18804 23084  4096 2009-07-25 07:53 .
> > 88539821 drwxr-sr-x 20 18804 23084  4096 2008-08-20 10:14 ..
> > 88541578 -rw-rw-rw-  1 18804 23084   471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> > 88541465 -rw-rw-rw-  1 18804 23084  6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> > 88541471 -rw-rw-rw-  1 18804 23084  1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> > 88541549 -rw-rw-rw-  1 18804 23084  2813 2009-07-25 03:04 INDEX-.edfac52c
> > 88541366 -rw-rw-rw-  1 18804 23084     0 2008-08-17 20:44 .ok
> >        ? ?---------  ? ?     ?         ?                ? spip%3Farticle19.f8740dca
> > 88541671 -rw-rw-rw-  1 18804 23084  5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> > 88541460 -rw-rw-rw-  1 18804 23084  5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> > 88540284 -rw-rw-rw-  1 18804 23084  3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> > 88541539 -rw-rw-rw-  1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
>   OK, so we couldn't stat a directory...
> 
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > cat: spip%3Farticle19.f8740dca: Stale NFS file handle
>   This is probably the misleading output from ext3_iget(). It should give
> you EIO in the latest kernel.

root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
cat: spip%3Farticle19.f8740dca: Input/output error

It has much more sense now. We thought the problem was around NFS due 
the the previous error message, actually this is probably not the best 
looking path.


> > root@bazooka:~# debugfs /dev/md10
> > debugfs 1.40-WIP (14-Nov-2006)
> > 
> >     debugfs:  stat <88539836>
> > 
> > Inode: 88539836   Type: directory    Mode:  0755   Flags: 0x0   Generation: 791796957
> > User: 18804   Group: 23084   Size: 4096
> > File ACL: 0    Directory ACL: 0
> > Links: 2   Blockcount: 8
> > Fragment:  Address: 0    Number: 0    Size: 0
> > ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> > mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > Size of extra inode fields: 4
> > BLOCKS:
> > (0):177096928
> > TOTAL: 1
> > 
> > 
> >     debugfs:  ls <88539836>
> > 
> >  88539836  (12) .    88539821  (32) ..    88541366  (12) .ok
> >  88541465  (56) -inc_rss_item-32-wa.23d91cc2
> >  88541539  (40) spip%3Fpage%3Djquery.cce608b6.gz
> >  88540284  (40) spip%3Fpage%3Dforum-30.63b2c1b1
> >  88541460  (28) spip%3Fmot5.f3e9adda
> >  88541471  (160) -inc_rubriques-17-wa.f2f152f0
> >  88541549  (24) INDEX-.edfac52c    88541578  (284) -inc_forum-10-wa.3cb1921f
> >  88541562  (36) spip%3Farticle19.f8740dca
> >  88541671  (3372) spip%3Fauteur1.c64f7f7e
>   The directory itself looks fine...
> 
> >     debugfs:  stat <88541562>
> > 
> > Inode: 88541562   Type: regular    Mode:  0666   Flags: 0x0   Generation: 860068541
> > User: 18804   Group: 23084   Size: 0
> > File ACL: 0    Directory ACL: 0
> > Links: 0   Blockcount: 0
> > Fragment:  Address: 0    Number: 0    Size: 0
> > ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> > mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > Size of extra inode fields: 4
> > BLOCKS:
> 
>   Ah, OK, here's the problem. The directory points to a file which is
> obviously deleted (note the "Links: 0"). All the content of the inode seems
> to indicate that the file was correctly deleted (you might check that the
> corresponding bit in the bitmap is cleared via: "icheck 88541562").

root@bazooka:~# debugfs /dev/md10
debugfs 1.40-WIP (14-Nov-2006)
debugfs:  icheck 88541562
Block   Inode number
88541562        <block not found>


>   The question is how it could happen the directory still points to the
> inode. Really strange. It looks as if we've lost a write to the directory
> but I don't see how. Are there any suspitious kernel messages in this case?

There were nothing for a while, but since the reboot there are some 
about this inode: 

EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562


> > We'll try.
> 
>   It probably won't help. This particular directory had just one block so
> DIR_INDEX had no effect on it.

Let's keep dir_index for now, then.


>   OK, so it's probably not a storage device problem. Good to know.

We also thought about motherboard, CPU, or chassis issues, but 
everything has been replaced.


The check of the MD raid6 array always ends happily:

Jul  5 01:06:01 bazooka kernel: md: data-check of RAID array md10
Jul  5 01:06:01 bazooka kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Jul  5 01:06:01 bazooka kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jul  5 01:06:01 bazooka kernel: md: using 128k window, over a total of 143373888 blocks.
Jul  5 04:28:28 bazooka kernel: md: md10: data-check done.


We never saw modification to the data of files themselves, maybe it 
happened, but we never saw any evidence of that. Of course, due to the 
modification of the filesystem structure, we saw files replaced by other 
files ;)


Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2009-07-28 11:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090420162017.GA28079@gradator.net>
     [not found] ` <20090716172749.GC3740@atrey.karlin.mff.cuni.cz>
     [not found]   ` <20090725151751.GA6419@gradator.net>
2009-07-27 15:42     ` 2.6.28.9: EXT3/NFS inodes corruption Jan Kara
2009-07-28 11:27       ` Sylvain Rochet [this message]
     [not found]         ` <20090728112715.GA8442-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-07-28 13:52           ` Jan Kara
2009-07-28 16:41             ` Sylvain Rochet
2009-07-28 21:12               ` J. Bruce Fields
2009-08-04 10:50                 ` Sylvain Rochet
2009-07-29 12:58               ` Jan Kara
2009-08-04 11:02                 ` Sylvain Rochet
     [not found]               ` <20090728164142.GA13662-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-03 22:29                 ` Jan Kara
2009-08-04 11:15                   ` Sylvain Rochet
     [not found]                     ` <20090804111505.GA6433-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-04 22:56                       ` Jan Kara
     [not found]                         ` <20090804225619.GB11097-pwKtmJkCtMINMLpHRKhSow@public.gmane.org>
2009-08-06 13:15                           ` Sylvain Rochet
     [not found]                             ` <20090806131555.GA23359-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-06 17:05                               ` J. Bruce Fields
2009-08-12 22:34                               ` Jan Kara
     [not found]                                 ` <20090812223453.GC10729-pwKtmJkCtMINMLpHRKhSow@public.gmane.org>
2009-08-20 17:19                                   ` Sylvain Rochet
     [not found]                                     ` <20090820171952.GA15133-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-21  0:00                                       ` Simon Kirby
2009-08-21 10:51                                         ` Sylvain Rochet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090728112715.GA8442@gradator.net \
    --to=gradator@gradator.net \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).