From: Bill Davidsen <davidsen@tmr.com>
To: whollygoat@letterboxes.org
Cc: linux-raid@vger.kernel.org, David Greaves <david@dgreaves.com>
Subject: Re: some ?? re failed disk and resyncing of array
Date: Sun, 01 Feb 2009 14:41:37 -0500 [thread overview]
Message-ID: <4985FAF1.2090208@tmr.com> (raw)
In-Reply-To: <1233403388.29916.1297756217@webmail.messagingengine.com>
whollygoat@letterboxes.org wrote:
> On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> said:
>
>> whollygoat@letterboxes.org wrote:
>>
>>> On a boot a couple of days ago, mdadm failed a disk and
>>> started resyncing to spare (raid5, 6 drives, 5 active, 1
>>> spare). smartctl -H <disk> returned info (can't remember
>>> the exact text) that made me suspect the drive was
>>> fine, but the data connection was bad. Sure enough the
>>> data cable was damaged. Replaced the cable and smartctl
>>> sees the disk just fine and reports no errors.
>>>
>>> - I'd like to readd the drive as a spare. Is it enough
>>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
>>> remove any data that said where it previously belonged
>>> in the array?
>>>
>> That should work.
>> Any issues and you can zero the superblock (man mdadm)
>> No need to zero the disk.
>>
>
> Would --re-add be better?
>
>
I don't think do. And I would zero the superblock. The more detail you
put into preventing unwanted autodetection the fewer learning
experiences you will have.
> I've noticed something else since I made the initial post
>
> --------- begin output -------------
> fly:~# mdadm -D /dev/md0
> /dev/md0:
> Version : 01.00.03
> Creation Time : Sun Jan 11 21:49:36 2009
> Raid Level : raid5
> Array Size : 312602368 (298.12 GiB 320.10 GB)
> Device Size : 156301184 (74.53 GiB 80.03 GB)
> Raid Devices : 5
> Total Devices : 5
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Fri Jan 30 15:52:01 2009
> State : active
> Active Devices : 5
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Name : fly:FlyFileServ_md (local to host fly)
> UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
> Events : 16
>
> Number Major Minor RaidDevice State
> 0 33 1 0 active sync /dev/hde1
> 1 34 1 1 active sync /dev/hdg1
> 2 56 1 2 active sync /dev/hdi1
> 5 89 1 3 active sync /dev/hdo1
> 6 88 1 4 active sync /dev/hdm1
>
>
> fly:~# mdadm -E /dev/hdo1
> /dev/hdo1:
> Magic : a92b4efc
> Version : 01
> Feature Map : 0x1
> Array UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
> Name : fly:FlyFileServ_md (local to host fly)
> Creation Time : Sun Jan 11 21:49:36 2009
> Raid Level : raid5
> Raid Devices : 5
>
> Device Size : 234436336 (111.79 GiB 120.03 GB)
> Array Size : 625204736 (298.12 GiB 320.10 GB)
> Used Size : 156301184 (74.53 GiB 80.03 GB)
> Super Offset : 234436464 sectors
> State : clean
> Device UUID : e072bd09:2df53d6d:d23321cc:cf2c37de
>
> Internal Bitmap : 2 sectors from superblock
> Update Time : Fri Jan 30 15:52:01 2009
> Checksum : 4689ff5 - correct
> Events : 16
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
> Array State : uuuUu 2 failed
> --------- end output -------------
>
> Why does the "Array Slot" field show 7 slots? And why
> does the field "Array State" show 2 failed? There
> ever only were 6 disks in the array. Only one of those
> is currently missing. mdadm -D above doesn't list any
> failed devices in the "Failed Devices" field.
>
>
No idea, but did you explicitly remove the failed drive? Was there a
failed drive at some time in the past?
I've never seen this, but I always remove drives, which may or may not
be related.
> Thanks for your answers below as well. It's kind of
> what I was expecting. There was a h/w problem that
> took ages to track down and I think it was reponsible
> for all the e2fs errors.
>
> WG
>
>
>>> - When I tried to list some files on one of the filesystems
>>> on the array (the fact that it took so long to react to
>>> the ls is how I discovered the box was in the middle of
>>> rebuiling to spare)
>>>
>> This is OK - resync involves a lot of IO and can slow things down. This
>> is tuneable.
>>
>>
>>> it couldn't find the file (or many
>>> others). I thought that resyncing was supposed to be
>>> transparent, yet parts of the fs seemed to be missing.
>>> Everything was there afterwards. Is that normal?
>>>
>> No. This is nothing to do with normal md resyncing and certainly not
>> expected.
>>
>>
>>> - On a subsequent boot I had to run e2fsck on the three
>>> filesystems housed on the array. Many stray blocks,
>>> illegal inodes, etc were found. An artifact of the rebuild
>>> or unrelated?
>>>
>> Well, you had a fault in your IO system there's a good chance your O
>> broke.
>>
>> Verify against a backup.
>>
>> David
>>
>>
>> --
>> "Don't worry, you'll be fine; I saw it work in a cartoon once..."
>>
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
next prev parent reply other threads:[~2009-02-01 19:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-31 8:16 some ?? re failed disk and resyncing of array whollygoat
2009-01-31 10:38 ` David Greaves
2009-01-31 12:03 ` whollygoat
2009-02-01 19:41 ` Bill Davidsen [this message]
2009-02-02 1:47 ` whollygoat
2009-02-03 0:52 ` zero-superblock, " whollygoat
2009-02-03 8:48 ` David Greaves
2009-02-04 4:48 ` whollygoat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4985FAF1.2090208@tmr.com \
--to=davidsen@tmr.com \
--cc=david@dgreaves.com \
--cc=linux-raid@vger.kernel.org \
--cc=whollygoat@letterboxes.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.