From: Sylvain Rochet <gradator@gradator.net>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org
Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption
Date: Sat, 25 Jul 2009 17:17:52 +0200 [thread overview]
Message-ID: <20090725151751.GA6419@gradator.net> (raw)
In-Reply-To: <20090716172749.GC3740@atrey.karlin.mff.cuni.cz>
[-- Attachment #1.1: Type: text/plain, Size: 4700 bytes --]
Hi,
Sorry for the late answer, waiting for the problem to happen again ;)
On Thu, Jul 16, 2009 at 07:27:49PM +0200, Jan Kara wrote:
> Hi,
>
> > We(TuxFamily) are having some inodes corruptions on a NFS server.
> >
> > So, let's start with the facts.
> >
> >
> > ==== NFS Server
> >
> > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
>
> Can you still see the corruption with 2.6.30 kernel?
Not upgraded yet, we'll give a try.
> If you can still see this problem, could you run: debugfs /dev/md10
> and send output of the command:
> stat <40420228>
> (or whatever the corrupted inode number will be)
> and also:
> dump <40420228> /tmp/corrupted_dir
One inode get corrupted recently, here is the output:
root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
total 64
88539836 drwxr-sr-x 2 18804 23084 4096 2009-07-25 07:53 .
88539821 drwxr-sr-x 20 18804 23084 4096 2008-08-20 10:14 ..
88541578 -rw-rw-rw- 1 18804 23084 471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
88541465 -rw-rw-rw- 1 18804 23084 6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
88541471 -rw-rw-rw- 1 18804 23084 1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
88541549 -rw-rw-rw- 1 18804 23084 2813 2009-07-25 03:04 INDEX-.edfac52c
88541366 -rw-rw-rw- 1 18804 23084 0 2008-08-17 20:44 .ok
? ?--------- ? ? ? ? ? spip%3Farticle19.f8740dca
88541671 -rw-rw-rw- 1 18804 23084 5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
88541460 -rw-rw-rw- 1 18804 23084 5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
88540284 -rw-rw-rw- 1 18804 23084 3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
88541539 -rw-rw-rw- 1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
cat: spip%3Farticle19.f8740dca: Stale NFS file handle
root@bazooka:~# debugfs /dev/md10
debugfs 1.40-WIP (14-Nov-2006)
debugfs: stat <88539836>
Inode: 88539836 Type: directory Mode: 0755 Flags: 0x0 Generation: 791796957
User: 18804 Group: 23084 Size: 4096
File ACL: 0 Directory ACL: 0
Links: 2 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
Size of extra inode fields: 4
BLOCKS:
(0):177096928
TOTAL: 1
debugfs: ls <88539836>
88539836 (12) . 88539821 (32) .. 88541366 (12) .ok
88541465 (56) -inc_rss_item-32-wa.23d91cc2
88541539 (40) spip%3Fpage%3Djquery.cce608b6.gz
88540284 (40) spip%3Fpage%3Dforum-30.63b2c1b1
88541460 (28) spip%3Fmot5.f3e9adda
88541471 (160) -inc_rubriques-17-wa.f2f152f0
88541549 (24) INDEX-.edfac52c 88541578 (284) -inc_forum-10-wa.3cb1921f
88541562 (36) spip%3Farticle19.f8740dca
88541671 (3372) spip%3Fauteur1.c64f7f7e
debugfs: stat <88541562>
Inode: 88541562 Type: regular Mode: 0666 Flags: 0x0 Generation: 860068541
User: 18804 Group: 23084 Size: 0
File ACL: 0 Directory ACL: 0
Links: 0 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
Size of extra inode fields: 4
BLOCKS:
debugfs: dump <88539836> /tmp/corrupted_dir
(file attached)
> You might want to try disabling the DIR_INDEX feature and see whether
> the corruption still occurs...
We'll try.
> > Keeping inodes into servers' cache seems to prevent the problem to happen.
> > ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
>
> I'd guess just because they don't have to be read from disk where they
> get corrupted.
Exactly.
> Interesting, but it may well be just by the way how these files get
> created / updated.
Yes, this is only because of that.
Additional data that may help, we replaced the storage server to
something slower (less number of CPU, less number of cores, ...). We are
still getting some corruption but with non-common sense with the former
server.
The data are stored on two storage arrays of disks. The primary one is
made of fiber-channel disks used through a simple fiber-channel card,
RAID soft with md, raid6. The secondary one is made of SCSI disks used
through a RAID-hard card. We got corruption on both, depending on
the one currently used into production.
Sylvain
[-- Attachment #1.2: corrupted_dir --]
[-- Type: application/octet-stream, Size: 4096 bytes --]
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2009-07-25 16:39 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-20 16:20 2.6.28.9: EXT3/NFS inodes corruption Sylvain Rochet
2009-07-16 17:27 ` Jan Kara
2009-07-25 15:17 ` Sylvain Rochet [this message]
2009-07-27 15:42 ` Jan Kara
2009-07-28 11:27 ` Sylvain Rochet
[not found] ` <20090728112715.GA8442-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-07-28 13:52 ` Jan Kara
2009-07-28 13:52 ` Jan Kara
2009-07-28 13:52 ` Jan Kara
2009-07-28 16:41 ` Sylvain Rochet
2009-07-28 21:12 ` J. Bruce Fields
2009-08-04 10:50 ` Sylvain Rochet
2009-07-29 12:58 ` Jan Kara
2009-08-04 11:02 ` Sylvain Rochet
[not found] ` <20090728164142.GA13662-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-03 22:29 ` Jan Kara
2009-08-03 22:29 ` Jan Kara
2009-08-03 22:29 ` Jan Kara
2009-08-04 11:15 ` Sylvain Rochet
[not found] ` <20090804111505.GA6433-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-04 22:56 ` Jan Kara
2009-08-04 22:56 ` Jan Kara
[not found] ` <20090804225619.GB11097-pwKtmJkCtMINMLpHRKhSow@public.gmane.org>
2009-08-06 13:15 ` Sylvain Rochet
2009-08-06 13:15 ` Sylvain Rochet
2009-08-06 13:15 ` Sylvain Rochet
[not found] ` <20090806131555.GA23359-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-06 17:05 ` J. Bruce Fields
2009-08-06 17:05 ` J. Bruce Fields
2009-08-06 17:05 ` J. Bruce Fields
2009-08-12 22:34 ` Jan Kara
2009-08-12 22:34 ` Jan Kara
2009-08-12 22:34 ` Jan Kara
[not found] ` <20090812223453.GC10729-pwKtmJkCtMINMLpHRKhSow@public.gmane.org>
2009-08-20 17:19 ` Sylvain Rochet
2009-08-20 17:19 ` Sylvain Rochet
2009-08-20 17:19 ` Sylvain Rochet
[not found] ` <20090820171952.GA15133-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-08-21 0:00 ` Simon Kirby
2009-08-21 0:00 ` Simon Kirby
2009-08-21 0:00 ` Simon Kirby
2009-08-21 10:51 ` Sylvain Rochet
[not found] <ct4xS-63o-27@gated-at.bofh.it>
2009-07-28 16:40 ` Daniel J Blueman
2009-07-28 16:45 ` Sylvain Rochet
2009-08-21 11:05 ` Daniel J Blueman
2009-08-21 14:32 ` Sylvain Rochet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090725151751.GA6419@gradator.net \
--to=gradator@gradator.net \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.