All of lore.kernel.org
 help / color / mirror / Atom feed
From: Craig Shelley <craig@microtron.org.uk>
To: reiserfs-list@namesys.com
Subject: A Word of Warning about Linux Software Raid
Date: Fri, 11 Aug 2006 19:09:46 +0100	[thread overview]
Message-ID: <1155319786.8121.43.camel@localhost.localdomain> (raw)

[-- Attachment #1: Type: text/plain, Size: 3017 bytes --]

Hi all,

I have a little story that made me learn some very important lessons
about Linux Software Raid1 (Mirroring).

A local power outage caused my system to turn off in a very rough way.
The power didn't cleanly go off, instead it toggled on and off a few
times quickly before finally staying off.

When the power was restored my reiser4 partitions were a bit poorly, and
required some attention with fsck.reiser4.

Ever since this event, reiser4 warnings have often been displayed on the
console on unmount when shutting down/rebooting. Each time I saw the
messages, I ran fsck.reiser4 which sometimes resulted in errors being
found and fixed. Not knowing what partition was causing the problem was
a bit annoying since I have 4 reiser4 partitions.

Yesterday, running fsck.reiser4 resulted in not being able to boot the
system. Further runs of fsck.reiser4 would sometimes result in further
errors being found, and a few minutes later resulted in no errors being
found. At this point I began to wonder if my SATA controller had gone
faulty since the hardware was appearing to be time-variant.

Eventually the problem was diagnosed to be caused by the data on the two
mirrored disks not being identical. It seems that the kernel does not
check the integrity of the data on mirrored raid, and returns a "mix" of
data from each disk as it is accessed. Over time bad shutdowns/crashes
lead to differences between the data on the two mirrored disks, and this
can eventually have catastrophic consequences.


I re-synced the disks using the following commands:  (let me know if
there is a nicer way)

prometheus:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 hdc1[1] hda1[0]
      4883648 blocks [2/2] [UU]
...

prometheus:~# mdadm --manage --fail /dev/md0 /dev/hdc1
mdadm: set /dev/hdc1 faulty in /dev/md0
prometheus:~# mdadm --manage --remove /dev/md0 /dev/hdc1
mdadm: hot removed /dev/hdc1
prometheus:~# mdadm --manage --add /dev/md0 /dev/hdc1
mdadm: hot added /dev/hdc1

prometheus:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 hdc1[2] hda1[0]
      4883648 blocks [2/1] [U_]
      [====>................]  recovery = 22.4% (1098368/4883648)
finish=3.0min speed=20364K/sec
...


fsck.reiser4 could then be run to properly fix the errors.

I checked several other systems that I admin, and after re-syncing the
mirrored partitions on each system, errors were found on their
filesystems. 

It would be nice if in a similar way to how the kernel can hot-add disks
to the mirror, copying the data across in the background, that it could
also be told to run a background consistency check on the raid array,
and report/fix errors as it goes.
Are there any tools to do this or similar?

Although this is not a reiser4 issue, I thought it was important that I
make everyone aware of it.

Regards,

-- 
Craig Shelley
EMail: craig@microtron.org.uk
Jabber: shell@jabber.earth.li

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

             reply	other threads:[~2006-08-11 18:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-11 18:09 Craig Shelley [this message]
2006-08-11 19:34 ` A Word of Warning about Linux Software Raid Adrian Ulrich
2006-08-12 12:33   ` Philippe Gramoullé
2006-08-13 11:20     ` Justin Piszcz
2006-08-13 11:59       ` Philippe Gramoullé
2006-08-13 21:02     ` Craig Shelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1155319786.8121.43.camel@localhost.localdomain \
    --to=craig@microtron.org.uk \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.