From: Bas van Schaik <bas@tuxes.nl>
To: linux-raid@vger.kernel.org
Subject: Redundancy check using "echo check > sync_action": error reporting?
Date: Sun, 16 Mar 2008 15:21:11 +0100 [thread overview]
Message-ID: <47DD2CD7.2090802@tuxes.nl> (raw)
Hi all,
As we speak, I'm trying to debug a real weird type of filesystem
corruption in a quite complex layered system with networking involved:
ATA over Ethernet - RAID5 - LVM - CryptoLoop - EXT3
In plain English: four storage servers export a bunch of block devices
using AoE, the "cluster frontend" uses those devices to build three
RAID5 arrays. Those arrays are the basis of a large LVM volume group, in
which an Logical Volume was created with an encrypted 2.5TB EXT3
filesystem (cryptoloop).
Recently the system suffered massive filesystem corruption, which even
made e2fsck crash. Theodore Tso was able to analyze and fix the
filesystem partially and found out that some random garbage was written
to the EXT3 inode tables, as well some other weird corruptions.
Personally, I'm suspecting one of the storage servers or the network to
have caused these severe corruptions, but I have never seen any errors
on the RAID5 level.
The (Debian) system runs a montly check of the RAID5 arrays using Martin
F. Krafft's checkarray script. Basically this scripts performs a "echo
check > /sys/block/$array/md/sync_action" for all arrays. With my
(basic) knowledge of RAID5 I assume this check only recomputes the sums
and compares them to the stored XOR'ed value. This makes me wonder:
1) Will the kernel actually warn me when an inconsistency is found?
Reading some other posts on the lists, it seems the kernel will print a
"read error corrected!", is that correct? Note that I'm using kernel
2.6.18 (Debian stable), was it already implemented that way in that kernel?
2) How can the RAID code actually correct such a read error on RAID5?
How does it know which device actually contains the faulty data?
The answers to those questions are very important to me: if the kernel
actually warns me when an inconsistency is found, that rules out the
possibility that there is something wrong with the network or one of the
storage servers. Actually that would mean that the "cluster frontend" is
causing the corruptions.
Kind regards,
-- Bas van Schaik
next reply other threads:[~2008-03-16 14:21 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-16 14:21 Bas van Schaik [this message]
2008-03-16 15:14 ` Redundancy check using "echo check > sync_action": error reporting? Janek Kozicki
2008-03-20 13:32 ` Bas van Schaik
2008-03-20 13:47 ` Robin Hill
2008-03-20 14:19 ` Bas van Schaik
2008-03-20 14:45 ` Robin Hill
2008-03-20 15:16 ` Bas van Schaik
2008-03-20 16:04 ` Robin Hill
2008-03-20 16:35 ` Theodore Tso
2008-03-20 17:10 ` Robin Hill
2008-03-20 17:39 ` Andre Noll
2008-03-20 18:02 ` Theodore Tso
2008-03-20 18:57 ` Andre Noll
2008-03-21 14:02 ` Ric Wheeler
2008-03-21 20:19 ` NeilBrown
2008-03-21 20:45 ` Ric Wheeler
2008-03-22 17:13 ` Bill Davidsen
2008-03-20 23:08 ` Peter Rabbitson
2008-03-21 14:24 ` Bill Davidsen
2008-03-21 14:52 ` Peter Rabbitson
2008-03-21 17:13 ` Theodore Tso
2008-03-21 17:35 ` Peter Rabbitson
2008-03-22 13:27 ` Theodore Tso
2008-03-22 14:00 ` Bas van Schaik
2008-03-25 4:44 ` Neil Brown
2008-03-25 15:17 ` Bill Davidsen
2008-03-25 9:19 ` Mattias Wadenstein
2008-03-21 17:43 ` Robin Hill
2008-03-21 23:01 ` Bill Davidsen
2008-03-21 23:45 ` Carlos Carvalho
2008-03-22 17:19 ` Bill Davidsen
2008-03-21 23:55 ` Robin Hill
2008-03-22 10:03 ` Peter Rabbitson
2008-03-22 10:42 ` What do Events actually mean? Justin Piszcz
2008-03-22 17:35 ` David Greaves
2008-03-22 17:48 ` Justin Piszcz
2008-03-22 18:02 ` David Greaves
2008-03-25 3:58 ` Neil Brown
2008-03-26 8:57 ` David Greaves
2008-03-26 8:57 ` David Greaves
2008-05-04 7:30 ` Redundancy check using "echo check > sync_action": error reporting? Peter Rabbitson
2008-05-06 6:36 ` Luca Berra
2008-03-25 4:24 ` Neil Brown
2008-03-25 9:00 ` Peter Rabbitson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47DD2CD7.2090802@tuxes.nl \
--to=bas@tuxes.nl \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).