First experience with drive being kicked

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* First experience with drive being kicked
@ 2010-04-13 22:54 Mark Knecht
  2010-04-14  2:24 ` Mark Knecht
  0 siblings, 1 reply; 2+ messages in thread
From: Mark Knecht @ 2010-04-13 22:54 UTC (permalink / raw)
  To: Linux-RAID

OK, I was messing around in the box today adding two more drives and I
probably hit a cable or something but maybe not. /dev/md3 was
effected, but md5 built on the same drives wasn't. Possibly this has
been there for a day or two and I didn't notice it.  These drives are
only a few days old so I hope I'm not seeing some sort of early
problem. Supposedly good drives - WD 500GB RAID Edition.

Currently all my RAIDs are RAID1 assembled by the kernel at boot time.
I have no mdadm.conf file. mdadm is a running daemon.

From dmesg:

md: considering sdb3 ...
md:  adding sdb3 ...
md:  adding sdc3 ...
md:  adding sda3 ...
md: created md3
md: bind<sda3>
md: bind<sdc3>
md: bind<sdb3>
md: running: <sdb3><sdc3><sda3>
md: kicking non-fresh sdb3 from array!
md: unbind<sdb3>
md: export_rdev(sdb3)
raid1: raid set md3 active with 2 out of 3 mirrors
md3: detected capacity change from 0 to 53694562304

How do I go about trying to /dev/sdb3 back into the array and what
sort of checking is advised when this happens before I add it back?
The bad drive (sdb) doesn't look much different than the good drives.
(sda shown, sdc)

cruncher ~ # smartctl -A /dev/sdb
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   239   236   021    Pre-fail
Always       -       1016
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       24
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age
Always       -       87
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       22
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       12
193 Load_Cycle_Count        0x0032   200   200   000    Old_age
Always       -       11
194 Temperature_Celsius     0x0022   109   105   000    Old_age
Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
Offline      -       0

cruncher ~ # smartctl -A /dev/sda
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   239   235   021    Pre-fail
Always       -       1016
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       24
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age
Always       -       87
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       22
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       11
193 Load_Cycle_Count        0x0032   200   200   000    Old_age
Always       -       12
194 Temperature_Celsius     0x0022   108   106   000    Old_age
Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
Offline      -       0

cruncher ~ #

Thanks,
Mark

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: First experience with drive being kicked
  2010-04-13 22:54 First experience with drive being kicked Mark Knecht
@ 2010-04-14  2:24 ` Mark Knecht
  0 siblings, 0 replies; 2+ messages in thread
From: Mark Knecht @ 2010-04-14  2:24 UTC (permalink / raw)
  To: Linux-RAID

On Tue, Apr 13, 2010 at 3:54 PM, Mark Knecht <markknecht@gmail.com> wrote:
> OK, I was messing around in the box today adding two more drives and I
> probably hit a cable or something but maybe not. /dev/md3 was
> effected, but md5 built on the same drives wasn't. Possibly this has
> been there for a day or two and I didn't notice it.  These drives are
> only a few days old so I hope I'm not seeing some sort of early
> problem. Supposedly good drives - WD 500GB RAID Edition.
>
> Currently all my RAIDs are RAID1 assembled by the kernel at boot time.
> I have no mdadm.conf file. mdadm is a running daemon.
>
> From dmesg:
>
> md: considering sdb3 ...
> md:  adding sdb3 ...
> md:  adding sdc3 ...
> md:  adding sda3 ...
> md: created md3
> md: bind<sda3>
> md: bind<sdc3>
> md: bind<sdb3>
> md: running: <sdb3><sdc3><sda3>
> md: kicking non-fresh sdb3 from array!
> md: unbind<sdb3>
> md: export_rdev(sdb3)
> raid1: raid set md3 active with 2 out of 3 mirrors
> md3: detected capacity change from 0 to 53694562304
>
> How do I go about trying to /dev/sdb3 back into the array and what
> sort of checking is advised when this happens before I add it back?
> The bad drive (sdb) doesn't look much different than the good drives.
> (sda shown, sdc)
>
> cruncher ~ # smartctl -A /dev/sdb
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   239   236   021    Pre-fail
> Always       -       1016
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       24
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   100   100   000    Old_age
> Always       -       87
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       22
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       12
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age
> Always       -       11
> 194 Temperature_Celsius     0x0022   109   105   000    Old_age
> Always       -       38
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
> Offline      -       0
>
> cruncher ~ # smartctl -A /dev/sda
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   239   235   021    Pre-fail
> Always       -       1016
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       24
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   100   100   000    Old_age
> Always       -       87
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       22
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       11
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age
> Always       -       12
> 194 Temperature_Celsius     0x0022   108   106   000    Old_age
> Always       -       39
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
> Offline      -       0
>
> cruncher ~ #
>
> Thanks,
> Mark
>

So hopefully the process used below is basically correct.

- Mark

cruncher ~ # man mdadm
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdc3[2] sda3[0]
      52436096 blocks [3/2] [U_U]

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ # mdadm /dev/md3 -f /dev/sdb3
mdadm: set device faulty failed for /dev/sdb3:  No such device
cruncher ~ # mdadm /dev/md3 -r /dev/sdb3
mdadm: hot remove failed for /dev/sdb3: No such device or address
cruncher ~ # fdisk /dev/sdb

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): p

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x703d11ba

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1           7       56196   83  Linux
/dev/sdb2               8         530     4200997+  82  Linux swap / Solaris
/dev/sdb3             536        7063    52436160   fd  Linux raid autodetect
/dev/sdb4            7064       60801   431650485    5  Extended
/dev/sdb5            7064       13591    52436128+  fd  Linux raid autodetect

Command (m for help): q

cruncher ~ # mdadm /dev/md3 -a /dev/sdb3
mdadm: re-added /dev/sdb3
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdb3[3] sdc3[2] sda3[0]
      52436096 blocks [3/2] [U_U]
      [>....................]  recovery =  1.3% (695488/52436096)
finish=8.6min speed=99355K/sec

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdb3[3] sdc3[2] sda3[0]
      52436096 blocks [3/2] [U_U]
      [===========>.........]  recovery = 56.3% (29540736/52436096)
finish=5.0min speed=75950K/sec

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdb3[1] sdc3[2] sda3[0]
      52436096 blocks [3/3] [UUU]

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ #
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-04-14  2:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-13 22:54 First experience with drive being kicked Mark Knecht
2010-04-14  2:24 ` Mark Knecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).