some ?? re failed disk and resyncing of array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* some ?? re failed disk and resyncing of array
@ 2009-01-31  8:16 whollygoat
  2009-01-31 10:38 ` David Greaves
  0 siblings, 1 reply; 8+ messages in thread
From: whollygoat @ 2009-01-31  8:16 UTC (permalink / raw)
  To: linux-raid

On a boot a couple of days ago, mdadm failed a disk and
started resyncing to spare (raid5, 6 drives, 5 active, 1
spare).  smartctl -H <disk> returned info (can't remember
the exact text) that made me suspect the drive was
fine, but the data connection was bad.  Sure enough the
data cable was damaged.  Replaced the cable and smartctl
sees the disk just fine and reports no errors.

- I'd like to readd the drive as a spare.  Is it enough
to "mdadm --add /dev/hdk" or do I need to prep the drive to
remove any data that said where it previously belonged
in the array?  

- When I tried to list some files on one of the filesystems
on the array (the fact that it took so long to react to
the ls is how I discovered the box was in the middle of
rebuiling to spare) it couldn't find the file (or many 
others).  I thought that resyncing was supposed to be
transparent, yet parts of the fs seemed to be missing.
Everything was there afterwards.  Is that normal?

- On a subsequent boot I had to run e2fsck on the three
filesystems housed on the array.  Many stray blocks, 
illegal inodes, etc were found.  An artifact of the rebuild
or unrelated?

Thanks.

WG
-- 
  
  whollygoat@letterboxes.org

-- 
http://www.fastmail.fm - Send your email first class


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: some ?? re failed disk and resyncing of array
  2009-01-31  8:16 some ?? re failed disk and resyncing of array whollygoat
@ 2009-01-31 10:38 ` David Greaves
  2009-01-31 12:03   ` whollygoat
  0 siblings, 1 reply; 8+ messages in thread
From: David Greaves @ 2009-01-31 10:38 UTC (permalink / raw)
  To: whollygoat; +Cc: linux-raid

whollygoat@letterboxes.org wrote:
> On a boot a couple of days ago, mdadm failed a disk and
> started resyncing to spare (raid5, 6 drives, 5 active, 1
> spare).  smartctl -H <disk> returned info (can't remember
> the exact text) that made me suspect the drive was
> fine, but the data connection was bad.  Sure enough the
> data cable was damaged.  Replaced the cable and smartctl
> sees the disk just fine and reports no errors.
> 
> - I'd like to readd the drive as a spare.  Is it enough
> to "mdadm --add /dev/hdk" or do I need to prep the drive to
> remove any data that said where it previously belonged
> in the array?
That should work.
Any issues and you can zero the superblock (man mdadm)
No need to zero the disk.

> - When I tried to list some files on one of the filesystems
> on the array (the fact that it took so long to react to
> the ls is how I discovered the box was in the middle of
> rebuiling to spare)
This is OK - resync involves a lot of IO and can slow things down. This is tuneable.

> it couldn't find the file (or many 
> others).  I thought that resyncing was supposed to be
> transparent, yet parts of the fs seemed to be missing.
> Everything was there afterwards.  Is that normal?
No. This is nothing to do with normal md resyncing and certainly not expected.

> - On a subsequent boot I had to run e2fsck on the three
> filesystems housed on the array.  Many stray blocks, 
> illegal inodes, etc were found.  An artifact of the rebuild
> or unrelated?
Well, you had a fault in your IO system there's a good chance your O broke.

Verify against a backup.

David


-- 
"Don't worry, you'll be fine; I saw it work in a cartoon once..."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: some ?? re failed disk and resyncing of array
  2009-01-31 10:38 ` David Greaves
@ 2009-01-31 12:03   ` whollygoat
  2009-02-01 19:41     ` Bill Davidsen
  0 siblings, 1 reply; 8+ messages in thread
From: whollygoat @ 2009-01-31 12:03 UTC (permalink / raw)
  To: linux-raid; +Cc: David Greaves


On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
said:
> whollygoat@letterboxes.org wrote:
> > On a boot a couple of days ago, mdadm failed a disk and
> > started resyncing to spare (raid5, 6 drives, 5 active, 1
> > spare).  smartctl -H <disk> returned info (can't remember
> > the exact text) that made me suspect the drive was
> > fine, but the data connection was bad.  Sure enough the
> > data cable was damaged.  Replaced the cable and smartctl
> > sees the disk just fine and reports no errors.
> > 
> > - I'd like to readd the drive as a spare.  Is it enough
> > to "mdadm --add /dev/hdk" or do I need to prep the drive to
> > remove any data that said where it previously belonged
> > in the array?
> That should work.
> Any issues and you can zero the superblock (man mdadm)
> No need to zero the disk.

Would --re-add be better?

I've noticed something else since I made the initial post

--------- begin output -------------
fly:~# mdadm -D /dev/md0
/dev/md0:
        Version : 01.00.03
  Creation Time : Sun Jan 11 21:49:36 2009
     Raid Level : raid5
     Array Size : 312602368 (298.12 GiB 320.10 GB)
    Device Size : 156301184 (74.53 GiB 80.03 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Jan 30 15:52:01 2009
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : fly:FlyFileServ_md  (local to host fly)
           UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
         Events : 16

    Number   Major   Minor   RaidDevice State
       0      33        1        0      active sync   /dev/hde1
       1      34        1        1      active sync   /dev/hdg1
       2      56        1        2      active sync   /dev/hdi1
       5      89        1        3      active sync   /dev/hdo1
       6      88        1        4      active sync   /dev/hdm1


fly:~# mdadm -E /dev/hdo1
/dev/hdo1:
          Magic : a92b4efc
        Version : 01
    Feature Map : 0x1
     Array UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
           Name : fly:FlyFileServ_md  (local to host fly)
  Creation Time : Sun Jan 11 21:49:36 2009
     Raid Level : raid5
   Raid Devices : 5

    Device Size : 234436336 (111.79 GiB 120.03 GB)
     Array Size : 625204736 (298.12 GiB 320.10 GB)
      Used Size : 156301184 (74.53 GiB 80.03 GB)
   Super Offset : 234436464 sectors
          State : clean
    Device UUID : e072bd09:2df53d6d:d23321cc:cf2c37de

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Jan 30 15:52:01 2009
       Checksum : 4689ff5 - correct
         Events : 16

         Layout : left-symmetric
     Chunk Size : 64K

    Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
   Array State : uuuUu 2 failed
--------- end output -------------

Why does the "Array Slot" field show 7 slots?  And why
does the field "Array State" show 2 failed?  There 
ever only were 6 disks in the array.  Only one of those
is currently missing.  mdadm -D above doesn't list any
failed devices in the "Failed Devices" field.

Thanks for your answers below as well.  It's kind of 
what I was expecting.  There was a h/w problem that
took ages to track down and I think it was reponsible
for all the e2fs errors.

WG

> 
> > - When I tried to list some files on one of the filesystems
> > on the array (the fact that it took so long to react to
> > the ls is how I discovered the box was in the middle of
> > rebuiling to spare)
> This is OK - resync involves a lot of IO and can slow things down. This
> is tuneable.
> 
> > it couldn't find the file (or many 
> > others).  I thought that resyncing was supposed to be
> > transparent, yet parts of the fs seemed to be missing.
> > Everything was there afterwards.  Is that normal?
> No. This is nothing to do with normal md resyncing and certainly not
> expected.
> 
> > - On a subsequent boot I had to run e2fsck on the three
> > filesystems housed on the array.  Many stray blocks, 
> > illegal inodes, etc were found.  An artifact of the rebuild
> > or unrelated?
> Well, you had a fault in your IO system there's a good chance your O
> broke.
> 
> Verify against a backup.
> 
> David
> 
> 
> -- 
> "Don't worry, you'll be fine; I saw it work in a cartoon once..."
-- 
  
  whollygoat@letterboxes.org

-- 
http://www.fastmail.fm - IMAP accessible web-mail


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: some ?? re failed disk and resyncing of array
  2009-01-31 12:03   ` whollygoat
@ 2009-02-01 19:41     ` Bill Davidsen
  2009-02-02  1:47       ` whollygoat
  2009-02-03  0:52       ` zero-superblock, " whollygoat
  0 siblings, 2 replies; 8+ messages in thread
From: Bill Davidsen @ 2009-02-01 19:41 UTC (permalink / raw)
  To: whollygoat; +Cc: linux-raid, David Greaves

whollygoat@letterboxes.org wrote:
> On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> said:
>   
>> whollygoat@letterboxes.org wrote:
>>     
>>> On a boot a couple of days ago, mdadm failed a disk and
>>> started resyncing to spare (raid5, 6 drives, 5 active, 1
>>> spare).  smartctl -H <disk> returned info (can't remember
>>> the exact text) that made me suspect the drive was
>>> fine, but the data connection was bad.  Sure enough the
>>> data cable was damaged.  Replaced the cable and smartctl
>>> sees the disk just fine and reports no errors.
>>>
>>> - I'd like to readd the drive as a spare.  Is it enough
>>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
>>> remove any data that said where it previously belonged
>>> in the array?
>>>       
>> That should work.
>> Any issues and you can zero the superblock (man mdadm)
>> No need to zero the disk.
>>     
>
> Would --re-add be better?
>
>   
I don't think do. And I would zero the superblock. The more detail you 
put into preventing unwanted autodetection the fewer learning 
experiences you will have.

> I've noticed something else since I made the initial post
>
> --------- begin output -------------
> fly:~# mdadm -D /dev/md0
> /dev/md0:
>         Version : 01.00.03
>   Creation Time : Sun Jan 11 21:49:36 2009
>      Raid Level : raid5
>      Array Size : 312602368 (298.12 GiB 320.10 GB)
>     Device Size : 156301184 (74.53 GiB 80.03 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 0
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Fri Jan 30 15:52:01 2009
>           State : active
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>            Name : fly:FlyFileServ_md  (local to host fly)
>            UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
>          Events : 16
>
>     Number   Major   Minor   RaidDevice State
>        0      33        1        0      active sync   /dev/hde1
>        1      34        1        1      active sync   /dev/hdg1
>        2      56        1        2      active sync   /dev/hdi1
>        5      89        1        3      active sync   /dev/hdo1
>        6      88        1        4      active sync   /dev/hdm1
>
>
> fly:~# mdadm -E /dev/hdo1
> /dev/hdo1:
>           Magic : a92b4efc
>         Version : 01
>     Feature Map : 0x1
>      Array UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
>            Name : fly:FlyFileServ_md  (local to host fly)
>   Creation Time : Sun Jan 11 21:49:36 2009
>      Raid Level : raid5
>    Raid Devices : 5
>
>     Device Size : 234436336 (111.79 GiB 120.03 GB)
>      Array Size : 625204736 (298.12 GiB 320.10 GB)
>       Used Size : 156301184 (74.53 GiB 80.03 GB)
>    Super Offset : 234436464 sectors
>           State : clean
>     Device UUID : e072bd09:2df53d6d:d23321cc:cf2c37de
>
> Internal Bitmap : 2 sectors from superblock
>     Update Time : Fri Jan 30 15:52:01 2009
>        Checksum : 4689ff5 - correct
>          Events : 16
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
>    Array State : uuuUu 2 failed
> --------- end output -------------
>
> Why does the "Array Slot" field show 7 slots?  And why
> does the field "Array State" show 2 failed?  There 
> ever only were 6 disks in the array.  Only one of those
> is currently missing.  mdadm -D above doesn't list any
> failed devices in the "Failed Devices" field.
>
>   
No idea, but did you explicitly remove the failed drive? Was there a 
failed drive at some time in the past?

I've never seen this, but I always remove drives, which may or may not 
be related.

> Thanks for your answers below as well.  It's kind of 
> what I was expecting.  There was a h/w problem that
> took ages to track down and I think it was reponsible
> for all the e2fs errors.
>
> WG
>
>   
>>> - When I tried to list some files on one of the filesystems
>>> on the array (the fact that it took so long to react to
>>> the ls is how I discovered the box was in the middle of
>>> rebuiling to spare)
>>>       
>> This is OK - resync involves a lot of IO and can slow things down. This
>> is tuneable.
>>
>>     
>>> it couldn't find the file (or many 
>>> others).  I thought that resyncing was supposed to be
>>> transparent, yet parts of the fs seemed to be missing.
>>> Everything was there afterwards.  Is that normal?
>>>       
>> No. This is nothing to do with normal md resyncing and certainly not
>> expected.
>>
>>     
>>> - On a subsequent boot I had to run e2fsck on the three
>>> filesystems housed on the array.  Many stray blocks, 
>>> illegal inodes, etc were found.  An artifact of the rebuild
>>> or unrelated?
>>>       
>> Well, you had a fault in your IO system there's a good chance your O
>> broke.
>>
>> Verify against a backup.
>>
>> David
>>
>>
>> -- 
>> "Don't worry, you'll be fine; I saw it work in a cartoon once..."
>>     


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: some ?? re failed disk and resyncing of array
  2009-02-01 19:41     ` Bill Davidsen
@ 2009-02-02  1:47       ` whollygoat
  2009-02-03  0:52       ` zero-superblock, " whollygoat
  1 sibling, 0 replies; 8+ messages in thread
From: whollygoat @ 2009-02-02  1:47 UTC (permalink / raw)
  To: linux-raid; +Cc: Bill Davidsen, David Greaves


On Sun, 01 Feb 2009 14:41:37 -0500, "Bill Davidsen" <davidsen@tmr.com>
said:
> whollygoat@letterboxes.org wrote:
> > On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> > said:
> >   
> >> whollygoat@letterboxes.org wrote:
> >>     
> >>> On a boot a couple of days ago, mdadm failed a disk and
> >>> started resyncing to spare (raid5, 6 drives, 5 active, 1
> >>> spare).  smartctl -H <disk> returned info (can't remember
> >>> the exact text) that made me suspect the drive was
> >>> fine, but the data connection was bad.  Sure enough the
> >>> data cable was damaged.  Replaced the cable and smartctl
> >>> sees the disk just fine and reports no errors.
> >>>
> >>> - I'd like to readd the drive as a spare.  Is it enough
> >>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
> >>> remove any data that said where it previously belonged
> >>> in the array?
> >>>       
> >> That should work.
> >> Any issues and you can zero the superblock (man mdadm)
> >> No need to zero the disk.
> >>     
> >
> > Would --re-add be better?
> >
> >   
> I don't think do. And I would zero the superblock. The more detail you 
> put into preventing unwanted autodetection the fewer learning 
> experiences you will have.

Will do
> > fly:~# mdadm -D /dev/md0
[snip]

> >    Raid Devices : 5
> >   Total Devices : 5
> > Preferred Minor : 0
> >     Persistence : Superblock is persistent
> >
> >   Intent Bitmap : Internal
> >
> >     Update Time : Fri Jan 30 15:52:01 2009
> >           State : active
> >  Active Devices : 5
> > Working Devices : 5
> >  Failed Devices : 0
> >   Spare Devices : 0

[snip]
> >
> >     Number   Major   Minor   RaidDevice State
> >        0      33        1        0      active sync   /dev/hde1
> >        1      34        1        1      active sync   /dev/hdg1
> >        2      56        1        2      active sync   /dev/hdi1
> >        5      89        1        3      active sync   /dev/hdo1
> >        6      88        1        4      active sync   /dev/hdm1
> >
> >
> > fly:~# mdadm -E /dev/hdo1

[snip]
> >
> >     Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
> >    Array State : uuuUu 2 failed
> > --------- end output -------------
> >
> > Why does the "Array Slot" field show 7 slots?  And why
> > does the field "Array State" show 2 failed?  There 
> > ever only were 6 disks in the array.  Only one of those
> > is currently missing.  mdadm -D above doesn't list any
> > failed devices in the "Failed Devices" field.
> >
> >   
> No idea, but did you explicitly remove the failed drive? Was there a 
> failed drive at some time in the past?

No explicit removal.  Maybe I should have.  I let it rebuild
then shutdown to see if it was just something like cabling.
After dealing with the cabling problem and rebooting mdadm -D
didn't show any failed drives, just as above, so it never occurred
to me to remove the drive.

Is there anything I can do to fix the information reported by
mdadm -E <component device>?  Maybe when I add the old drive
as the new spare it will be taken care of?

Thanks,

wg

-- 
  
  whollygoat@letterboxes.org

-- 
http://www.fastmail.fm - The way an email service should be


^ permalink raw reply	[flat|nested] 8+ messages in thread

* zero-superblock, Re: some ?? re failed disk and resyncing of array
  2009-02-01 19:41     ` Bill Davidsen
  2009-02-02  1:47       ` whollygoat
@ 2009-02-03  0:52       ` whollygoat
  2009-02-03  8:48         ` David Greaves
  1 sibling, 1 reply; 8+ messages in thread
From: whollygoat @ 2009-02-03  0:52 UTC (permalink / raw)
  To: linux-raid; +Cc: Bill Davidsen, David Greaves

On Sun, 01 Feb 2009 14:41:37 -0500, "Bill Davidsen" <davidsen@tmr.com>
said:
> whollygoat@letterboxes.org wrote:
> > On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@dgreaves.com>
> > said:
> >   
> >> whollygoat@letterboxes.org wrote:
> >>     
> >>> On a boot a couple of days ago, mdadm failed a disk and
> >>> started resyncing to spare (raid5, 6 drives, 5 active, 1
> >>> spare).  smartctl -H <disk> returned info (can't remember
> >>> the exact text) that made me suspect the drive was
> >>> fine, but the data connection was bad.  Sure enough the
> >>> data cable was damaged.  Replaced the cable and smartctl
> >>> sees the disk just fine and reports no errors.
> >>>
> >>> - I'd like to readd the drive as a spare.  Is it enough
> >>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
> >>> remove any data that said where it previously belonged
> >>> in the array?
> >>>       
> >> That should work.
> >> Any issues and you can zero the superblock (man mdadm)
> >> No need to zero the disk.
> >>     
> >
> > Would --re-add be better?
> >
> >   
> I don't think do. And I would zero the superblock. The more detail you 
> put into preventing unwanted autodetection the fewer learning 
> experiences you will have.

Can anyone provide any more insight with the below?

fly:~# mdadm --zero-superblock /dev/hdk1
mdadm: Unrecognised md component device - /dev/hdk1

fly:~# fdisk -l /dev/hdk

Disk /dev/hdk: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdk1               1       14593   117218241   da  Non-FS data

fly:~# mdadm -a /dev/hdk1
mdadm: /dev/hdk1 does not appear to be an md device

fly:~# smartctl -a /dev/hdk
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar SE family
Device Model:     WDC WD1200JB-00GVC0
Serial Number:    WD-WCALA2237663
Firmware Version: 08.02D08
User Capacity:    120,034,123,776 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Feb  2 16:50:13 2009 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection:
                                        Enabled.
Self-test execution status:      (   0) The previous self-test routine
completed
                                        without error or no self-test
                                        has ever 
                                        been run.
Total time to complete Offline 
data collection:                 (3472) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
                                        on/off support.
                                        Suspend Offline collection upon
                                        new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging
                                        support.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  49) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always
        -       0
  3 Spin_Up_Time            0x0007   126   122   021    Pre-fail  Always
        -       4200
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always
        -       680
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
        -       0
  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always
        -       0
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always
        -       10951
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always
       -       0
 11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always
       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
       -       677
194 Temperature_Celsius     0x0022   112   094   000    Old_age   Always
      -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
      -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
      -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always
      -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always
      -       0
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail 
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     10922    
    -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute
delay.


Thanks,

wg
-- 
  
  whollygoat@letterboxes.org

-- 
http://www.fastmail.fm - The way an email service should be


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zero-superblock, Re: some ?? re failed disk and resyncing of array
  2009-02-03  0:52       ` zero-superblock, " whollygoat
@ 2009-02-03  8:48         ` David Greaves
  2009-02-04  4:48           ` whollygoat
  0 siblings, 1 reply; 8+ messages in thread
From: David Greaves @ 2009-02-03  8:48 UTC (permalink / raw)
  To: whollygoat; +Cc: linux-raid, Bill Davidsen

whollygoat@letterboxes.org wrote:
> Can anyone provide any more insight with the below?
I agree the error messages don't help :)
Old version of mdadm? IIRC the error reports are better now.

> fly:~# mdadm --zero-superblock /dev/hdk1
> mdadm: Unrecognised md component device - /dev/hdk1
It is likely that hdk1 is not an md component device and has no superblock.

> fly:~# mdadm -a /dev/hdk1
> mdadm: /dev/hdk1 does not appear to be an md device
Normally:
  mdadm [mode] <raiddevice> [options] <component-devices>
so:
  mdadm /dev/md0 -a /dev/hdk1
would work (otherwise which raid are you adding to?)


David


-- 
"Don't worry, you'll be fine; I saw it work in a cartoon once..."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zero-superblock, Re: some ?? re failed disk and resyncing of array
  2009-02-03  8:48         ` David Greaves
@ 2009-02-04  4:48           ` whollygoat
  0 siblings, 0 replies; 8+ messages in thread
From: whollygoat @ 2009-02-04  4:48 UTC (permalink / raw)
  To: linux-raid

On Tue, 03 Feb 2009 08:48:47 +0000, 
"David Greaves" <david@dgreaves.com> said:
> whollygoat@letterboxes.org wrote:
> > Can anyone provide any more insight with the below?
> I agree the error messages don't help :)
> Old version of mdadm? IIRC the error reports are better now.

fly:~# mdadm -V
mdadm - v2.5.6 - 9 November 2006

debian 4.0
> 
> > fly:~# mdadm --zero-superblock /dev/hdk1
> > mdadm: Unrecognised md component device - /dev/hdk1
> It is likely that hdk1 is not an md component device and has no
> superblock.
> 
> > fly:~# mdadm -a /dev/hdk1
> > mdadm: /dev/hdk1 does not appear to be an md device
> Normally:
>   mdadm [mode] <raiddevice> [options] <component-devices>
> so:
>   mdadm /dev/md0 -a /dev/hdk1
> would work (otherwise which raid are you adding to?)

Doh!  This happened to me when I was failing and removing
drives to replace them with larger ones.  Either the error
message was clearer or I had my head screwed on tighter
'cause I managed to figure out what you've just pointed out:

fly:~# mdadm /dev/md/0 --zero-superblock /dev/hdk1

fly:~# mdadm /dev/md/0 -a /dev/hdk1
mdadm: added /dev/hdk1

Thanks.  I'm still concerned about the discrepancy between
--detail <array> and --examine <any-component-device>, 
especially since I just zeroed the superblock on k1.  That
is what --examine looks at isn't it?
fly:~# mdadm -D /dev/md/0
/dev/md/0:
[snip]
   Raid Devices : 5
  Total Devices : 6
Preferred Minor : 0
[snip]
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

[snip]
    Number   Major   Minor   RaidDevice State
       0      33        1        0      active sync   /dev/hde1
       1      34        1        1      active sync   /dev/hdg1
       2      56        1        2      active sync   /dev/hdi1
       5      89        1        3      active sync   /dev/hdo1
       6      88        1        4      active sync   /dev/hdm1

       7      57        1        -      spare   /dev/hdk1


fly:~# mdadm -E /dev/hdk1
/dev/hdk1:
[snip]
    Array Slot : 7 (0, 1, 2, failed, failed, 3, 4)
   Array State : uuuuu 2 failed

I recently tried to grow the array after replacing, one by 
one, 40G drives with the current 80 and 120G drives.  That
did not go smoothly and I ended up having to just recreate
the array.  I was getting the same kind of bad output from
--examine.

Before I could get the array fully restored from backup, I 
discovered some flaky hardware.  I suppose that could be
responsible for the strange Array Slot and State output above?
Either that or I am doing something seriously wrong.  Does it
seem reasonable to start from scratch again, now that I have
all the h/w issues worked out? or does it seem more like I'm
messing up the way I create it?

# mdadm -C /dev/md/0 -e 1.0 -v -l5 -b internal\
  -a yes -n 5 /dev/hde1 /dev/hdg1 /dev/hdi1 /dev/hdk1\
  /dev/hdm1 -x 1 /dev/hdo1 --name=<name>

wg
-- 
  
  whollygoat@letterboxes.org

-- 
http://www.fastmail.fm - mmm... Fastmail...


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-02-04  4:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-31  8:16 some ?? re failed disk and resyncing of array whollygoat
2009-01-31 10:38 ` David Greaves
2009-01-31 12:03   ` whollygoat
2009-02-01 19:41     ` Bill Davidsen
2009-02-02  1:47       ` whollygoat
2009-02-03  0:52       ` zero-superblock, " whollygoat
2009-02-03  8:48         ` David Greaves
2009-02-04  4:48           ` whollygoat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).