From: "Pierre Vignéras" <pierre@vigneras.name>
To: Leslie Rhorer <lrhorer@satx.rr.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: mdadm: failed devices become spares!
Date: Mon, 17 May 2010 20:10:36 +0200 [thread overview]
Message-ID: <201005172010.36157.pierre@vigneras.name> (raw)
In-Reply-To: <9D.D3.23029.CDD40FB4@cdptpa-omtalb.mail.rr.com>
On dimanche 16 mai 2010, Leslie Rhorer wrote:
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of Pierre Vignéras
> > Sent: Sunday, May 16, 2010 10:41 AM
> > To: linux-raid@vger.kernel.org
> > Subject: mdadm: failed devices become spares!
> >
> > Hi,
> >
> > I encountered a critical problem with mdadm that I submitted to the
> > Debian mailing list (it's a debian lenny/stable). They asked me to submit
> > this to you. So that's what I do.
> >
> > To prevent duplication of description/information, I give you the URL of
> > that
> > bug description:
> >
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352
> >
> > If you prefer the full stuff to be copy/pasted to that mailing list, just
> > ask
> > for it.
> >
> > Note: that bug happened again today, on another RAID array. So the good
> > news
> > is that it is somewhat reproducible! The bad news, is that unless you
> > have a
> > magic solution, all my data are just lost (half of it was in the backup
> > pipe!)...
> >
> > Thanks for any help, and regards.
> > --
> > Pierre Vignéras
>
> It's not quite clear to me from the link whether your drives are
> truly toast, or not. If they are, then you are hosed. Assuming not, then
> you need to use
>
> `mdadm --examine /dev/sdxx` and `mdadm -Dt /dev/mdyy`
>
> to determine precisely all the parameters and the order of the block
> devices in the array. You need the chunk size, the superblock type, which
> slot was occupied by each device in the array (this may not be the same as
> when the array was created), the size of the array (if it did not fill the
> entire partition in every case), the RAID level, etc. Once you are certain
> you have all the information to enable you to re-create the array, if need
> be, the try to re-assemble the array with
>
> `mdadm --assemble --force /dev/mdyy`
>
> If it works, then fsck the file system. (I think I noticed you are
> using XFS. If so, do not use XFS_Check. Instead, use XFS_Repair with the
> -n option.) After you have a clean file system, issue the command
>
> `echo repair > /sys/block/mdyy/md/sync_action`
>
> to re-sync the array. If the array does not assemble, then you will
> need to stop it and re-create it using the options you obtained from your
> research above and adding the --assume-clean switch to prevent a resync if
> something is wrong. If the fsck won't work after re-creating the array,
> then you probably got one or more of the parameters incorrect.
Thanks for your help. Here is what I did:
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
[...]
md2 : inactive sdc1[2](S) sdd1[5](S) sdf1[4](S) sde1[3](S)
1250274304 blocks
[...]
# mdadm --examine /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sde1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
Creation Time : Thu Aug 6 01:59:44 2009
Raid Level : raid10
Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 625137152 (596.18 GiB 640.14 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Update Time : Tue Apr 13 19:22:21 2010
State : clean
Internal Bitmap : present
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2
Checksum : 5baf7939 - correct
Events : 90612
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 33 2 active sync /dev/sdc1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 spare /dev/sdf1
5 5 8 49 5 spare /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
Creation Time : Thu Aug 6 01:59:44 2009
Raid Level : raid10
Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 625137152 (596.18 GiB 640.14 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Update Time : Tue Apr 13 19:22:21 2010
State : clean
Internal Bitmap : present
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2
Checksum : 5baf7949 - correct
Events : 90612
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 8 49 5 spare /dev/sdd1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 spare /dev/sdf1
5 5 8 49 5 spare /dev/sdd1
/dev/sdf1:
Magic : a92b4efc
Version : 00.90.00
UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
Creation Time : Thu Aug 6 01:59:44 2009
Raid Level : raid10
Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 625137152 (596.18 GiB 640.14 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Update Time : Tue Apr 13 19:22:21 2010
State : clean
Internal Bitmap : present
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2
Checksum : 5baf7967 - correct
Events : 90612
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 81 4 spare /dev/sdf1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 spare /dev/sdf1
5 5 8 49 5 spare /dev/sdd1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
Creation Time : Thu Aug 6 01:59:44 2009
Raid Level : raid10
Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
Array Size : 625137152 (596.18 GiB 640.14 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Update Time : Tue Apr 13 19:22:21 2010
State : clean
Internal Bitmap : present
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2
Checksum : 5baf795b - correct
Events : 90612
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 spare /dev/sdf1
5 5 8 49 5 spare /dev/sdd1
# mdadm -Dt /dev/md2
mdadm: md device /dev/md2 does not appear to be active.
phobos:~#
# mdadm --assemble --force /dev/md2
mdadm: /dev/md2 assembled from 2 drives and 2 spares - not enough to start the
array.
#
What I don't get, is how those devices /dev/sdf1 and /dev/sdd1 have been
marked as spares after being marked as faulty! I never asked for it. As shown
at the previous Debian Bug link (repeated here for your convenience):
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352
<bug description extract>
...
Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2,
component device /dev/sdf1
Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device
/dev/md2, component device /dev/sdf1
Is that last line normal? It seems to me that this failed drive has
been made a spare! (I really hope that I misunderstood something). Is
it possible that the USB system (with its "plug'n play" sort-of
feature) had made the behaviour of mdadm so strange?
</bug>
And the next question is: how to activate those 2 spare drives? I was
expecting mdadm to use them automagically.
Did I miss something, or is there something really strange happening there?
Thanks again.
--
Pierre Vignéras
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-05-17 18:10 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-16 15:40 mdadm: failed devices become spares! Pierre Vignéras
2010-05-16 19:56 ` Leslie Rhorer
2010-05-17 18:10 ` Pierre Vignéras [this message]
2010-05-17 21:09 ` Tim Small
2010-05-18 1:30 ` Neil Brown
2010-05-18 2:06 ` Neil Brown
2010-05-18 22:25 ` MRK
2010-05-19 19:56 ` Simon Matthews
2010-05-21 21:00 ` Pierre Vignéras
2010-05-21 21:27 ` mdadm: failed devices become spares! -> Solved ! Pierre Vignéras
2010-05-18 23:07 ` mdadm: failed devices become spares! Pierre Vignéras
2010-05-19 1:45 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201005172010.36157.pierre@vigneras.name \
--to=pierre@vigneras.name \
--cc=linux-raid@vger.kernel.org \
--cc=lrhorer@satx.rr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).