All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Patrick H." <linux-raid@feystorm.net>
To: linux-raid@vger.kernel.org
Subject: filesystem corruption
Date: Sun, 02 Jan 2011 18:58:34 -0700	[thread overview]
Message-ID: <4D212D4A.3040003@feystorm.net> (raw)

I've been trying to track down an issue for a while now and from digging 
around it appears (though not certain) the issue lies with the md raid 
device.
Whats happening is that after improperly shutting down a raid-5 array, 
upon reassembly, a few files on the filesystem will be corrupt. I dont 
think this is normal filesystem corruption from files being modified 
during the shut down because some of the files that end up corrupted are 
several hours old.

The exact details of what I'm doing:
I have a 3-node test cluster I'm doing integrity testing on. Each node 
in the cluster is exporting a couple of disks via ATAoE.
I have the first disk of all 3 nodes in a raid-1 that is holding the 
journal data for the ext3 filesystem. The array is running with an 
internal bitmap as well.
The second disk of all 3 nodes is a raid-5 array holding the ext3 
filesystem itself. This is also running with an internal bitmap.
The ext3 filesystem is mounted with 'data=journal,barrier=1,sync'.
When I power down the node which is actively running both md raid 
devices, another node in the cluster takes over and starts both arrays 
up (in degraded mode of course).
Once the original node comes back up, the new master re-adds its disks 
back into the raid arrays and re-syncs them.
During all this, the filesystem is exported through nfs (nfs also has 
sync turned on) and a client is randomly creating, removing, and 
verifying checksums on the files in the filesystem (nfs is hard mounted 
so operations always retry). The client script averages about 30 
creations/s, 30 deletes/s, and 30 checksums/s.

So, as stated above, every now and then (1 in 50 chance or so), when the 
master is hard-rebooted, the client will detect a few files with invalid 
md5 checksums. These files could be hours old so they were not being 
actively modified.
Another key point that leads me to believe its a md raid issue is that 
before I had the ext3 journal running internally on the raid-5 array 
(part of the filesystem itself). When I did this, there would 
occasionally be massive corruption. As in file modification times in the 
future, lots of corrupt files, thousands of files put in the 
'lost+found' dir upon fsck, etc. After I put it on a separate raid-1, 
there are no more invalid modification times, there hasnt been a single 
file added to 'lost+found', and the number of corrupt files dropped 
significantly. This would seem to indicate that the journal was getting 
corrupted, and when it was played back, it went horribly wrong.

So it would seem there's something wrong with the raid-5 array, but I 
dont know what it could be. Any ideas or input would be much 
appreciated. I can modify the clustering scripts to obtain whatever 
information is needed when they start the arrays.

-Patrick

             reply	other threads:[~2011-01-03  1:58 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-03  1:58 Patrick H. [this message]
2011-01-03  3:16 ` filesystem corruption Neil Brown
     [not found]   ` <4D214B5C.3010103@feystorm.net>
2011-01-03  4:56     ` Neil Brown
2011-01-03  5:05       ` Patrick H.
2011-01-04  5:33         ` NeilBrown
2011-01-04  7:50           ` Patrick H.
2011-01-04 17:31             ` Patrick H.
2011-01-05  1:22               ` Patrick H.
2011-01-05  7:02   ` CoolCold
     [not found]   ` <AANLkTinL_nz58f8rSPuhYvVwGY5jdu1XVkNLC1ky5A65@mail.gmail.com>
2011-01-05 14:28     ` Patrick H.
2011-01-05 15:52       ` Spelic
2011-01-05 15:55         ` Patrick H.
  -- strict thread matches above, loose matches on Subject: below --
2018-12-03  9:31 Filesystem Corruption Stefan Malte Schumacher
2018-12-03 11:34 ` Qu Wenruo
2018-12-03 16:29 ` remi
2014-10-31  0:29 filesystem corruption Tobias Holst
2014-10-31  1:02 ` Tobias Holst
2014-10-31  2:41   ` Rich Freeman
2014-10-31 17:34     ` Tobias Holst
2014-11-02  4:49       ` Robert White
2014-11-02 21:57         ` Chris Murphy
2014-11-03  3:43           ` Zygo Blaxell
2014-11-03 17:11             ` Chris Murphy
2014-11-04  4:31               ` Zygo Blaxell
2014-11-04  8:25                 ` Duncan
2014-11-04 18:28                 ` Chris Murphy
2014-11-04 21:44                   ` Duncan
2014-11-04 22:19                   ` Robert White
2014-11-04 22:34                   ` Zygo Blaxell
2014-11-03  2:55         ` Tobias Holst
2014-11-03  3:49           ` Robert White
2007-06-06  3:10 Filesystem corruption Xu CanHao
2007-06-06 12:16 ` Ingo Bormuth
2007-05-30 20:13 devsk
2007-05-30 17:22 devsk
2007-05-30 19:24 ` Toby Thain
2007-05-30 20:03 ` David Masover
2007-05-31  0:11   ` Ingo Bormuth
2007-06-02 23:10     ` Edward Shishkin
2007-06-04  2:55       ` Ingo Bormuth
2007-06-04  9:41         ` Edward Shishkin
2007-06-05 23:20           ` Ingo Bormuth
2007-05-27 13:18 Laurent CARON
2007-05-28 12:23 ` Vladimir V. Saveliev
2007-05-28 14:10   ` Laurent CARON
2007-05-28 17:13     ` Vladimir V. Saveliev
2007-05-28 17:27       ` Laurent CARON
     [not found] ` <Pine.LNX.4.64.0705280025570.10429@sheep.housecafe.de>
2007-05-28 17:31   ` Christian Kujau
2007-05-28 18:16     ` Laurent CARON
2007-05-28 23:19       ` Christian Kujau
2007-05-29  8:39       ` Vladimir V. Saveliev
     [not found] ` <465BA9AC.8040805@ultraviolet.org>
2007-05-29  8:15   ` Vladimir V. Saveliev
2007-05-29 12:36     ` Toby Thain
2007-05-30 13:25       ` David Masover
2007-05-30 16:02         ` Vladimir V. Saveliev
2007-05-30 20:06           ` David Masover
2007-05-30 16:42         ` Toby Thain
2007-05-30 19:42           ` David Masover
2007-05-30 16:08       ` Vladimir V. Saveliev
2003-08-13 16:05 Locke
2003-08-14  7:49 ` Oleg Drokin
2002-09-05 15:57 Filesystem Corruption Brian Tinsley
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-07  7:15 ` Oleg Drokin
2002-06-11 16:49   ` Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2001-02-05 16:00 Filesystem corruption Ian Chilton
2001-02-05 13:16 Ian Chilton
2001-01-31 14:20 Carsten Langgaard
2001-01-31 15:52 ` Florian Lohoff
2001-01-31 16:24   ` Carsten Langgaard
2001-01-31 16:48     ` Florian Lohoff
2001-02-05 10:02 ` Ralf Baechle
2001-02-05 12:10   ` Alan Cox
2001-02-05 12:10     ` Alan Cox
2001-02-05 12:56     ` Geert Uytterhoeven
2001-02-05 13:01       ` Alan Cox
2001-02-05 13:01         ` Alan Cox
2001-02-05 22:01         ` Ralf Baechle
2001-02-05 22:01           ` Ralf Baechle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D212D4A.3040003@feystorm.net \
    --to=linux-raid@feystorm.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.