* some ?? re failed disk and resyncing of array
@ 2009-01-31 8:16 whollygoat
2009-01-31 10:38 ` David Greaves
0 siblings, 1 reply; 8+ messages in thread
From: whollygoat @ 2009-01-31 8:16 UTC (permalink / raw)
To: linux-raid
On a boot a couple of days ago, mdadm failed a disk and
started resyncing to spare (raid5, 6 drives, 5 active, 1
spare). smartctl -H <disk> returned info (can't remember
the exact text) that made me suspect the drive was
fine, but the data connection was bad. Sure enough the
data cable was damaged. Replaced the cable and smartctl
sees the disk just fine and reports no errors.
- I'd like to readd the drive as a spare. Is it enough
to "mdadm --add /dev/hdk" or do I need to prep the drive to
remove any data that said where it previously belonged
in the array?
- When I tried to list some files on one of the filesystems
on the array (the fact that it took so long to react to
the ls is how I discovered the box was in the middle of
rebuiling to spare) it couldn't find the file (or many
others). I thought that resyncing was supposed to be
transparent, yet parts of the fs seemed to be missing.
Everything was there afterwards. Is that normal?
- On a subsequent boot I had to run e2fsck on the three
filesystems housed on the array. Many stray blocks,
illegal inodes, etc were found. An artifact of the rebuild
or unrelated?
Thanks.
WG
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - Send your email first class
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some ?? re failed disk and resyncing of array
2009-01-31 8:16 some ?? re failed disk and resyncing of array whollygoat
@ 2009-01-31 10:38 ` David Greaves
2009-01-31 12:03 ` whollygoat
0 siblings, 1 reply; 8+ messages in thread
From: David Greaves @ 2009-01-31 10:38 UTC (permalink / raw)
To: whollygoat; +Cc: linux-raid
whollygoat@letterboxes.org wrote:
> On a boot a couple of days ago, mdadm failed a disk and
> started resyncing to spare (raid5, 6 drives, 5 active, 1
> spare). smartctl -H <disk> returned info (can't remember
> the exact text) that made me suspect the drive was
> fine, but the data connection was bad. Sure enough the
> data cable was damaged. Replaced the cable and smartctl
> sees the disk just fine and reports no errors.
>
> - I'd like to readd the drive as a spare. Is it enough
> to "mdadm --add /dev/hdk" or do I need to prep the drive to
> remove any data that said where it previously belonged
> in the array?
That should work.
Any issues and you can zero the superblock (man mdadm)
No need to zero the disk.
> - When I tried to list some files on one of the filesystems
> on the array (the fact that it took so long to react to
> the ls is how I discovered the box was in the middle of
> rebuiling to spare)
This is OK - resync involves a lot of IO and can slow things down. This is tuneable.
> it couldn't find the file (or many
> others). I thought that resyncing was supposed to be
> transparent, yet parts of the fs seemed to be missing.
> Everything was there afterwards. Is that normal?
No. This is nothing to do with normal md resyncing and certainly not expected.
> - On a subsequent boot I had to run e2fsck on the three
> filesystems housed on the array. Many stray blocks,
> illegal inodes, etc were found. An artifact of the rebuild
> or unrelated?
Well, you had a fault in your IO system there's a good chance your O broke.
Verify against a backup.
David
--
"Don't worry, you'll be fine; I saw it work in a cartoon once..."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some ?? re failed disk and resyncing of array
2009-01-31 10:38 ` David Greaves
@ 2009-01-31 12:03 ` whollygoat
2009-02-01 19:41 ` Bill Davidsen
0 siblings, 1 reply; 8+ messages in thread
From: whollygoat @ 2009-01-31 12:03 UTC (permalink / raw)
To: linux-raid; +Cc: David Greaves
On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
said:
> whollygoat@letterboxes.org wrote:
> > On a boot a couple of days ago, mdadm failed a disk and
> > started resyncing to spare (raid5, 6 drives, 5 active, 1
> > spare). smartctl -H <disk> returned info (can't remember
> > the exact text) that made me suspect the drive was
> > fine, but the data connection was bad. Sure enough the
> > data cable was damaged. Replaced the cable and smartctl
> > sees the disk just fine and reports no errors.
> >
> > - I'd like to readd the drive as a spare. Is it enough
> > to "mdadm --add /dev/hdk" or do I need to prep the drive to
> > remove any data that said where it previously belonged
> > in the array?
> That should work.
> Any issues and you can zero the superblock (man mdadm)
> No need to zero the disk.
Would --re-add be better?
I've noticed something else since I made the initial post
--------- begin output -------------
fly:~# mdadm -D /dev/md0
/dev/md0:
Version : 01.00.03
Creation Time : Sun Jan 11 21:49:36 2009
Raid Level : raid5
Array Size : 312602368 (298.12 GiB 320.10 GB)
Device Size : 156301184 (74.53 GiB 80.03 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Jan 30 15:52:01 2009
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : fly:FlyFileServ_md (local to host fly)
UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
Events : 16
Number Major Minor RaidDevice State
0 33 1 0 active sync /dev/hde1
1 34 1 1 active sync /dev/hdg1
2 56 1 2 active sync /dev/hdi1
5 89 1 3 active sync /dev/hdo1
6 88 1 4 active sync /dev/hdm1
fly:~# mdadm -E /dev/hdo1
/dev/hdo1:
Magic : a92b4efc
Version : 01
Feature Map : 0x1
Array UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
Name : fly:FlyFileServ_md (local to host fly)
Creation Time : Sun Jan 11 21:49:36 2009
Raid Level : raid5
Raid Devices : 5
Device Size : 234436336 (111.79 GiB 120.03 GB)
Array Size : 625204736 (298.12 GiB 320.10 GB)
Used Size : 156301184 (74.53 GiB 80.03 GB)
Super Offset : 234436464 sectors
State : clean
Device UUID : e072bd09:2df53d6d:d23321cc:cf2c37de
Internal Bitmap : 2 sectors from superblock
Update Time : Fri Jan 30 15:52:01 2009
Checksum : 4689ff5 - correct
Events : 16
Layout : left-symmetric
Chunk Size : 64K
Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
Array State : uuuUu 2 failed
--------- end output -------------
Why does the "Array Slot" field show 7 slots? And why
does the field "Array State" show 2 failed? There
ever only were 6 disks in the array. Only one of those
is currently missing. mdadm -D above doesn't list any
failed devices in the "Failed Devices" field.
Thanks for your answers below as well. It's kind of
what I was expecting. There was a h/w problem that
took ages to track down and I think it was reponsible
for all the e2fs errors.
WG
>
> > - When I tried to list some files on one of the filesystems
> > on the array (the fact that it took so long to react to
> > the ls is how I discovered the box was in the middle of
> > rebuiling to spare)
> This is OK - resync involves a lot of IO and can slow things down. This
> is tuneable.
>
> > it couldn't find the file (or many
> > others). I thought that resyncing was supposed to be
> > transparent, yet parts of the fs seemed to be missing.
> > Everything was there afterwards. Is that normal?
> No. This is nothing to do with normal md resyncing and certainly not
> expected.
>
> > - On a subsequent boot I had to run e2fsck on the three
> > filesystems housed on the array. Many stray blocks,
> > illegal inodes, etc were found. An artifact of the rebuild
> > or unrelated?
> Well, you had a fault in your IO system there's a good chance your O
> broke.
>
> Verify against a backup.
>
> David
>
>
> --
> "Don't worry, you'll be fine; I saw it work in a cartoon once..."
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - IMAP accessible web-mail
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: some ?? re failed disk and resyncing of array
2009-01-31 12:03 ` whollygoat
@ 2009-02-01 19:41 ` Bill Davidsen
2009-02-02 1:47 ` whollygoat
2009-02-03 0:52 ` zero-superblock, " whollygoat
0 siblings, 2 replies; 8+ messages in thread
From: Bill Davidsen @ 2009-02-01 19:41 UTC (permalink / raw)
To: whollygoat; +Cc: linux-raid, David Greaves
whollygoat@letterboxes.org wrote:
> On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> said:
>
>> whollygoat@letterboxes.org wrote:
>>
>>> On a boot a couple of days ago, mdadm failed a disk and
>>> started resyncing to spare (raid5, 6 drives, 5 active, 1
>>> spare). smartctl -H <disk> returned info (can't remember
>>> the exact text) that made me suspect the drive was
>>> fine, but the data connection was bad. Sure enough the
>>> data cable was damaged. Replaced the cable and smartctl
>>> sees the disk just fine and reports no errors.
>>>
>>> - I'd like to readd the drive as a spare. Is it enough
>>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
>>> remove any data that said where it previously belonged
>>> in the array?
>>>
>> That should work.
>> Any issues and you can zero the superblock (man mdadm)
>> No need to zero the disk.
>>
>
> Would --re-add be better?
>
>
I don't think do. And I would zero the superblock. The more detail you
put into preventing unwanted autodetection the fewer learning
experiences you will have.
> I've noticed something else since I made the initial post
>
> --------- begin output -------------
> fly:~# mdadm -D /dev/md0
> /dev/md0:
> Version : 01.00.03
> Creation Time : Sun Jan 11 21:49:36 2009
> Raid Level : raid5
> Array Size : 312602368 (298.12 GiB 320.10 GB)
> Device Size : 156301184 (74.53 GiB 80.03 GB)
> Raid Devices : 5
> Total Devices : 5
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Fri Jan 30 15:52:01 2009
> State : active
> Active Devices : 5
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Name : fly:FlyFileServ_md (local to host fly)
> UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
> Events : 16
>
> Number Major Minor RaidDevice State
> 0 33 1 0 active sync /dev/hde1
> 1 34 1 1 active sync /dev/hdg1
> 2 56 1 2 active sync /dev/hdi1
> 5 89 1 3 active sync /dev/hdo1
> 6 88 1 4 active sync /dev/hdm1
>
>
> fly:~# mdadm -E /dev/hdo1
> /dev/hdo1:
> Magic : a92b4efc
> Version : 01
> Feature Map : 0x1
> Array UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
> Name : fly:FlyFileServ_md (local to host fly)
> Creation Time : Sun Jan 11 21:49:36 2009
> Raid Level : raid5
> Raid Devices : 5
>
> Device Size : 234436336 (111.79 GiB 120.03 GB)
> Array Size : 625204736 (298.12 GiB 320.10 GB)
> Used Size : 156301184 (74.53 GiB 80.03 GB)
> Super Offset : 234436464 sectors
> State : clean
> Device UUID : e072bd09:2df53d6d:d23321cc:cf2c37de
>
> Internal Bitmap : 2 sectors from superblock
> Update Time : Fri Jan 30 15:52:01 2009
> Checksum : 4689ff5 - correct
> Events : 16
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
> Array State : uuuUu 2 failed
> --------- end output -------------
>
> Why does the "Array Slot" field show 7 slots? And why
> does the field "Array State" show 2 failed? There
> ever only were 6 disks in the array. Only one of those
> is currently missing. mdadm -D above doesn't list any
> failed devices in the "Failed Devices" field.
>
>
No idea, but did you explicitly remove the failed drive? Was there a
failed drive at some time in the past?
I've never seen this, but I always remove drives, which may or may not
be related.
> Thanks for your answers below as well. It's kind of
> what I was expecting. There was a h/w problem that
> took ages to track down and I think it was reponsible
> for all the e2fs errors.
>
> WG
>
>
>>> - When I tried to list some files on one of the filesystems
>>> on the array (the fact that it took so long to react to
>>> the ls is how I discovered the box was in the middle of
>>> rebuiling to spare)
>>>
>> This is OK - resync involves a lot of IO and can slow things down. This
>> is tuneable.
>>
>>
>>> it couldn't find the file (or many
>>> others). I thought that resyncing was supposed to be
>>> transparent, yet parts of the fs seemed to be missing.
>>> Everything was there afterwards. Is that normal?
>>>
>> No. This is nothing to do with normal md resyncing and certainly not
>> expected.
>>
>>
>>> - On a subsequent boot I had to run e2fsck on the three
>>> filesystems housed on the array. Many stray blocks,
>>> illegal inodes, etc were found. An artifact of the rebuild
>>> or unrelated?
>>>
>> Well, you had a fault in your IO system there's a good chance your O
>> broke.
>>
>> Verify against a backup.
>>
>> David
>>
>>
>> --
>> "Don't worry, you'll be fine; I saw it work in a cartoon once..."
>>
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some ?? re failed disk and resyncing of array
2009-02-01 19:41 ` Bill Davidsen
@ 2009-02-02 1:47 ` whollygoat
2009-02-03 0:52 ` zero-superblock, " whollygoat
1 sibling, 0 replies; 8+ messages in thread
From: whollygoat @ 2009-02-02 1:47 UTC (permalink / raw)
To: linux-raid; +Cc: Bill Davidsen, David Greaves
On Sun, 01 Feb 2009 14:41:37 -0500, "Bill Davidsen" <davidsen@tmr.com>
said:
> whollygoat@letterboxes.org wrote:
> > On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> > said:
> >
> >> whollygoat@letterboxes.org wrote:
> >>
> >>> On a boot a couple of days ago, mdadm failed a disk and
> >>> started resyncing to spare (raid5, 6 drives, 5 active, 1
> >>> spare). smartctl -H <disk> returned info (can't remember
> >>> the exact text) that made me suspect the drive was
> >>> fine, but the data connection was bad. Sure enough the
> >>> data cable was damaged. Replaced the cable and smartctl
> >>> sees the disk just fine and reports no errors.
> >>>
> >>> - I'd like to readd the drive as a spare. Is it enough
> >>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
> >>> remove any data that said where it previously belonged
> >>> in the array?
> >>>
> >> That should work.
> >> Any issues and you can zero the superblock (man mdadm)
> >> No need to zero the disk.
> >>
> >
> > Would --re-add be better?
> >
> >
> I don't think do. And I would zero the superblock. The more detail you
> put into preventing unwanted autodetection the fewer learning
> experiences you will have.
Will do
> > fly:~# mdadm -D /dev/md0
[snip]
> > Raid Devices : 5
> > Total Devices : 5
> > Preferred Minor : 0
> > Persistence : Superblock is persistent
> >
> > Intent Bitmap : Internal
> >
> > Update Time : Fri Jan 30 15:52:01 2009
> > State : active
> > Active Devices : 5
> > Working Devices : 5
> > Failed Devices : 0
> > Spare Devices : 0
[snip]
> >
> > Number Major Minor RaidDevice State
> > 0 33 1 0 active sync /dev/hde1
> > 1 34 1 1 active sync /dev/hdg1
> > 2 56 1 2 active sync /dev/hdi1
> > 5 89 1 3 active sync /dev/hdo1
> > 6 88 1 4 active sync /dev/hdm1
> >
> >
> > fly:~# mdadm -E /dev/hdo1
[snip]
> >
> > Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
> > Array State : uuuUu 2 failed
> > --------- end output -------------
> >
> > Why does the "Array Slot" field show 7 slots? And why
> > does the field "Array State" show 2 failed? There
> > ever only were 6 disks in the array. Only one of those
> > is currently missing. mdadm -D above doesn't list any
> > failed devices in the "Failed Devices" field.
> >
> >
> No idea, but did you explicitly remove the failed drive? Was there a
> failed drive at some time in the past?
No explicit removal. Maybe I should have. I let it rebuild
then shutdown to see if it was just something like cabling.
After dealing with the cabling problem and rebooting mdadm -D
didn't show any failed drives, just as above, so it never occurred
to me to remove the drive.
Is there anything I can do to fix the information reported by
mdadm -E <component device>? Maybe when I add the old drive
as the new spare it will be taken care of?
Thanks,
wg
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - The way an email service should be
^ permalink raw reply [flat|nested] 8+ messages in thread
* zero-superblock, Re: some ?? re failed disk and resyncing of array
2009-02-01 19:41 ` Bill Davidsen
2009-02-02 1:47 ` whollygoat
@ 2009-02-03 0:52 ` whollygoat
2009-02-03 8:48 ` David Greaves
1 sibling, 1 reply; 8+ messages in thread
From: whollygoat @ 2009-02-03 0:52 UTC (permalink / raw)
To: linux-raid; +Cc: Bill Davidsen, David Greaves
On Sun, 01 Feb 2009 14:41:37 -0500, "Bill Davidsen" <davidsen@tmr.com>
said:
> whollygoat@letterboxes.org wrote:
> > On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> > said:
> >
> >> whollygoat@letterboxes.org wrote:
> >>
> >>> On a boot a couple of days ago, mdadm failed a disk and
> >>> started resyncing to spare (raid5, 6 drives, 5 active, 1
> >>> spare). smartctl -H <disk> returned info (can't remember
> >>> the exact text) that made me suspect the drive was
> >>> fine, but the data connection was bad. Sure enough the
> >>> data cable was damaged. Replaced the cable and smartctl
> >>> sees the disk just fine and reports no errors.
> >>>
> >>> - I'd like to readd the drive as a spare. Is it enough
> >>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
> >>> remove any data that said where it previously belonged
> >>> in the array?
> >>>
> >> That should work.
> >> Any issues and you can zero the superblock (man mdadm)
> >> No need to zero the disk.
> >>
> >
> > Would --re-add be better?
> >
> >
> I don't think do. And I would zero the superblock. The more detail you
> put into preventing unwanted autodetection the fewer learning
> experiences you will have.
Can anyone provide any more insight with the below?
fly:~# mdadm --zero-superblock /dev/hdk1
mdadm: Unrecognised md component device - /dev/hdk1
fly:~# fdisk -l /dev/hdk
Disk /dev/hdk: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hdk1 1 14593 117218241 da Non-FS data
fly:~# mdadm -a /dev/hdk1
mdadm: /dev/hdk1 does not appear to be an md device
fly:~# smartctl -a /dev/hdk
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE family
Device Model: WDC WD1200JB-00GVC0
Serial Number: WD-WCALA2237663
Firmware Version: 08.02D08
User Capacity: 120,034,123,776 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Feb 2 16:50:13 2009 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: (3472) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon
new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging
support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 49) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always
- 0
3 Spin_Up_Time 0x0007 126 122 021 Pre-fail Always
- 4200
4 Start_Stop_Count 0x0032 100 100 040 Old_age Always
- 680
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always
- 0
9 Power_On_Hours 0x0032 085 085 000 Old_age Always
- 10951
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always
- 0
11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 677
194 Temperature_Celsius 0x0022 112 094 000 Old_age Always
- 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always
- 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail
Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10922
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute
delay.
Thanks,
wg
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - The way an email service should be
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: zero-superblock, Re: some ?? re failed disk and resyncing of array
2009-02-03 0:52 ` zero-superblock, " whollygoat
@ 2009-02-03 8:48 ` David Greaves
2009-02-04 4:48 ` whollygoat
0 siblings, 1 reply; 8+ messages in thread
From: David Greaves @ 2009-02-03 8:48 UTC (permalink / raw)
To: whollygoat; +Cc: linux-raid, Bill Davidsen
whollygoat@letterboxes.org wrote:
> Can anyone provide any more insight with the below?
I agree the error messages don't help :)
Old version of mdadm? IIRC the error reports are better now.
> fly:~# mdadm --zero-superblock /dev/hdk1
> mdadm: Unrecognised md component device - /dev/hdk1
It is likely that hdk1 is not an md component device and has no superblock.
> fly:~# mdadm -a /dev/hdk1
> mdadm: /dev/hdk1 does not appear to be an md device
Normally:
mdadm [mode] <raiddevice> [options] <component-devices>
so:
mdadm /dev/md0 -a /dev/hdk1
would work (otherwise which raid are you adding to?)
David
--
"Don't worry, you'll be fine; I saw it work in a cartoon once..."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zero-superblock, Re: some ?? re failed disk and resyncing of array
2009-02-03 8:48 ` David Greaves
@ 2009-02-04 4:48 ` whollygoat
0 siblings, 0 replies; 8+ messages in thread
From: whollygoat @ 2009-02-04 4:48 UTC (permalink / raw)
To: linux-raid
On Tue, 03 Feb 2009 08:48:47 +0000,
"David Greaves" <david@dgreaves.com> said:
> whollygoat@letterboxes.org wrote:
> > Can anyone provide any more insight with the below?
> I agree the error messages don't help :)
> Old version of mdadm? IIRC the error reports are better now.
fly:~# mdadm -V
mdadm - v2.5.6 - 9 November 2006
debian 4.0
>
> > fly:~# mdadm --zero-superblock /dev/hdk1
> > mdadm: Unrecognised md component device - /dev/hdk1
> It is likely that hdk1 is not an md component device and has no
> superblock.
>
> > fly:~# mdadm -a /dev/hdk1
> > mdadm: /dev/hdk1 does not appear to be an md device
> Normally:
> mdadm [mode] <raiddevice> [options] <component-devices>
> so:
> mdadm /dev/md0 -a /dev/hdk1
> would work (otherwise which raid are you adding to?)
Doh! This happened to me when I was failing and removing
drives to replace them with larger ones. Either the error
message was clearer or I had my head screwed on tighter
'cause I managed to figure out what you've just pointed out:
fly:~# mdadm /dev/md/0 --zero-superblock /dev/hdk1
fly:~# mdadm /dev/md/0 -a /dev/hdk1
mdadm: added /dev/hdk1
Thanks. I'm still concerned about the discrepancy between
--detail <array> and --examine <any-component-device>,
especially since I just zeroed the superblock on k1. That
is what --examine looks at isn't it?
fly:~# mdadm -D /dev/md/0
/dev/md/0:
[snip]
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
[snip]
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
[snip]
Number Major Minor RaidDevice State
0 33 1 0 active sync /dev/hde1
1 34 1 1 active sync /dev/hdg1
2 56 1 2 active sync /dev/hdi1
5 89 1 3 active sync /dev/hdo1
6 88 1 4 active sync /dev/hdm1
7 57 1 - spare /dev/hdk1
fly:~# mdadm -E /dev/hdk1
/dev/hdk1:
[snip]
Array Slot : 7 (0, 1, 2, failed, failed, 3, 4)
Array State : uuuuu 2 failed
I recently tried to grow the array after replacing, one by
one, 40G drives with the current 80 and 120G drives. That
did not go smoothly and I ended up having to just recreate
the array. I was getting the same kind of bad output from
--examine.
Before I could get the array fully restored from backup, I
discovered some flaky hardware. I suppose that could be
responsible for the strange Array Slot and State output above?
Either that or I am doing something seriously wrong. Does it
seem reasonable to start from scratch again, now that I have
all the h/w issues worked out? or does it seem more like I'm
messing up the way I create it?
# mdadm -C /dev/md/0 -e 1.0 -v -l5 -b internal\
-a yes -n 5 /dev/hde1 /dev/hdg1 /dev/hdi1 /dev/hdk1\
/dev/hdm1 -x 1 /dev/hdo1 --name=<name>
wg
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - mmm... Fastmail...
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-02-04 4:48 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-31 8:16 some ?? re failed disk and resyncing of array whollygoat
2009-01-31 10:38 ` David Greaves
2009-01-31 12:03 ` whollygoat
2009-02-01 19:41 ` Bill Davidsen
2009-02-02 1:47 ` whollygoat
2009-02-03 0:52 ` zero-superblock, " whollygoat
2009-02-03 8:48 ` David Greaves
2009-02-04 4:48 ` whollygoat
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).