From: Bill Davidsen <davidsen@tmr.com>
To: whollygoat@letterboxes.org
Cc: linux-raid@vger.kernel.org, David Greaves <david@dgreaves.com>
Subject: Re: some ?? re failed disk and resyncing of array
Date: Sun, 01 Feb 2009 14:41:37 -0500 [thread overview]
Message-ID: <4985FAF1.2090208@tmr.com> (raw)
In-Reply-To: <1233403388.29916.1297756217@webmail.messagingengine.com>
whollygoat@letterboxes.org wrote:
> On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> said:
>
>> whollygoat@letterboxes.org wrote:
>>
>>> On a boot a couple of days ago, mdadm failed a disk and
>>> started resyncing to spare (raid5, 6 drives, 5 active, 1
>>> spare). smartctl -H <disk> returned info (can't remember
>>> the exact text) that made me suspect the drive was
>>> fine, but the data connection was bad. Sure enough the
>>> data cable was damaged. Replaced the cable and smartctl
>>> sees the disk just fine and reports no errors.
>>>
>>> - I'd like to readd the drive as a spare. Is it enough
>>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
>>> remove any data that said where it previously belonged
>>> in the array?
>>>
>> That should work.
>> Any issues and you can zero the superblock (man mdadm)
>> No need to zero the disk.
>>
>
> Would --re-add be better?
>
>
I don't think do. And I would zero the superblock. The more detail you
put into preventing unwanted autodetection the fewer learning
experiences you will have.
> I've noticed something else since I made the initial post
>
> --------- begin output -------------
> fly:~# mdadm -D /dev/md0
> /dev/md0:
> Version : 01.00.03
> Creation Time : Sun Jan 11 21:49:36 2009
> Raid Level : raid5
> Array Size : 312602368 (298.12 GiB 320.10 GB)
> Device Size : 156301184 (74.53 GiB 80.03 GB)
> Raid Devices : 5
> Total Devices : 5
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Fri Jan 30 15:52:01 2009
> State : active
> Active Devices : 5
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Name : fly:FlyFileServ_md (local to host fly)
> UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
> Events : 16
>
> Number Major Minor RaidDevice State
> 0 33 1 0 active sync /dev/hde1
> 1 34 1 1 active sync /dev/hdg1
> 2 56 1 2 active sync /dev/hdi1
> 5 89 1 3 active sync /dev/hdo1
> 6 88 1 4 active sync /dev/hdm1
>
>
> fly:~# mdadm -E /dev/hdo1
> /dev/hdo1:
> Magic : a92b4efc
> Version : 01
> Feature Map : 0x1
> Array UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
> Name : fly:FlyFileServ_md (local to host fly)
> Creation Time : Sun Jan 11 21:49:36 2009
> Raid Level : raid5
> Raid Devices : 5
>
> Device Size : 234436336 (111.79 GiB 120.03 GB)
> Array Size : 625204736 (298.12 GiB 320.10 GB)
> Used Size : 156301184 (74.53 GiB 80.03 GB)
> Super Offset : 234436464 sectors
> State : clean
> Device UUID : e072bd09:2df53d6d:d23321cc:cf2c37de
>
> Internal Bitmap : 2 sectors from superblock
> Update Time : Fri Jan 30 15:52:01 2009
> Checksum : 4689ff5 - correct
> Events : 16
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
> Array State : uuuUu 2 failed
> --------- end output -------------
>
> Why does the "Array Slot" field show 7 slots? And why
> does the field "Array State" show 2 failed? There
> ever only were 6 disks in the array. Only one of those
> is currently missing. mdadm -D above doesn't list any
> failed devices in the "Failed Devices" field.
>
>
No idea, but did you explicitly remove the failed drive? Was there a
failed drive at some time in the past?
I've never seen this, but I always remove drives, which may or may not
be related.
> Thanks for your answers below as well. It's kind of
> what I was expecting. There was a h/w problem that
> took ages to track down and I think it was reponsible
> for all the e2fs errors.
>
> WG
>
>
>>> - When I tried to list some files on one of the filesystems
>>> on the array (the fact that it took so long to react to
>>> the ls is how I discovered the box was in the middle of
>>> rebuiling to spare)
>>>
>> This is OK - resync involves a lot of IO and can slow things down. This
>> is tuneable.
>>
>>
>>> it couldn't find the file (or many
>>> others). I thought that resyncing was supposed to be
>>> transparent, yet parts of the fs seemed to be missing.
>>> Everything was there afterwards. Is that normal?
>>>
>> No. This is nothing to do with normal md resyncing and certainly not
>> expected.
>>
>>
>>> - On a subsequent boot I had to run e2fsck on the three
>>> filesystems housed on the array. Many stray blocks,
>>> illegal inodes, etc were found. An artifact of the rebuild
>>> or unrelated?
>>>
>> Well, you had a fault in your IO system there's a good chance your O
>> broke.
>>
>> Verify against a backup.
>>
>> David
>>
>>
>> --
>> "Don't worry, you'll be fine; I saw it work in a cartoon once..."
>>
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
next prev parent reply other threads:[~2009-02-01 19:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-31 8:16 some ?? re failed disk and resyncing of array whollygoat
2009-01-31 10:38 ` David Greaves
2009-01-31 12:03 ` whollygoat
2009-02-01 19:41 ` Bill Davidsen [this message]
2009-02-02 1:47 ` whollygoat
2009-02-03 0:52 ` zero-superblock, " whollygoat
2009-02-03 8:48 ` David Greaves
2009-02-04 4:48 ` whollygoat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4985FAF1.2090208@tmr.com \
--to=davidsen@tmr.com \
--cc=david@dgreaves.com \
--cc=linux-raid@vger.kernel.org \
--cc=whollygoat@letterboxes.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).