* First experience with drive being kicked
@ 2010-04-13 22:54 Mark Knecht
2010-04-14 2:24 ` Mark Knecht
0 siblings, 1 reply; 2+ messages in thread
From: Mark Knecht @ 2010-04-13 22:54 UTC (permalink / raw)
To: Linux-RAID
OK, I was messing around in the box today adding two more drives and I
probably hit a cable or something but maybe not. /dev/md3 was
effected, but md5 built on the same drives wasn't. Possibly this has
been there for a day or two and I didn't notice it. These drives are
only a few days old so I hope I'm not seeing some sort of early
problem. Supposedly good drives - WD 500GB RAID Edition.
Currently all my RAIDs are RAID1 assembled by the kernel at boot time.
I have no mdadm.conf file. mdadm is a running daemon.
From dmesg:
md: considering sdb3 ...
md: adding sdb3 ...
md: adding sdc3 ...
md: adding sda3 ...
md: created md3
md: bind<sda3>
md: bind<sdc3>
md: bind<sdb3>
md: running: <sdb3><sdc3><sda3>
md: kicking non-fresh sdb3 from array!
md: unbind<sdb3>
md: export_rdev(sdb3)
raid1: raid set md3 active with 2 out of 3 mirrors
md3: detected capacity change from 0 to 53694562304
How do I go about trying to /dev/sdb3 back into the array and what
sort of checking is advised when this happens before I add it back?
The bad drive (sdb) doesn't look much different than the good drives.
(sda shown, sdc)
cruncher ~ # smartctl -A /dev/sdb
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 239 236 021 Pre-fail
Always - 1016
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 24
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age
Always - 87
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 22
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
Always - 12
193 Load_Cycle_Count 0x0032 200 200 000 Old_age
Always - 11
194 Temperature_Celsius 0x0022 109 105 000 Old_age
Always - 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age
Offline - 0
cruncher ~ # smartctl -A /dev/sda
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 239 235 021 Pre-fail
Always - 1016
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 24
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age
Always - 87
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 22
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
Always - 11
193 Load_Cycle_Count 0x0032 200 200 000 Old_age
Always - 12
194 Temperature_Celsius 0x0022 108 106 000 Old_age
Always - 39
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age
Offline - 0
cruncher ~ #
Thanks,
Mark
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: First experience with drive being kicked
2010-04-13 22:54 First experience with drive being kicked Mark Knecht
@ 2010-04-14 2:24 ` Mark Knecht
0 siblings, 0 replies; 2+ messages in thread
From: Mark Knecht @ 2010-04-14 2:24 UTC (permalink / raw)
To: Linux-RAID
On Tue, Apr 13, 2010 at 3:54 PM, Mark Knecht <markknecht@gmail.com> wrote:
> OK, I was messing around in the box today adding two more drives and I
> probably hit a cable or something but maybe not. /dev/md3 was
> effected, but md5 built on the same drives wasn't. Possibly this has
> been there for a day or two and I didn't notice it. These drives are
> only a few days old so I hope I'm not seeing some sort of early
> problem. Supposedly good drives - WD 500GB RAID Edition.
>
> Currently all my RAIDs are RAID1 assembled by the kernel at boot time.
> I have no mdadm.conf file. mdadm is a running daemon.
>
> From dmesg:
>
> md: considering sdb3 ...
> md: adding sdb3 ...
> md: adding sdc3 ...
> md: adding sda3 ...
> md: created md3
> md: bind<sda3>
> md: bind<sdc3>
> md: bind<sdb3>
> md: running: <sdb3><sdc3><sda3>
> md: kicking non-fresh sdb3 from array!
> md: unbind<sdb3>
> md: export_rdev(sdb3)
> raid1: raid set md3 active with 2 out of 3 mirrors
> md3: detected capacity change from 0 to 53694562304
>
> How do I go about trying to /dev/sdb3 back into the array and what
> sort of checking is advised when this happens before I add it back?
> The bad drive (sdb) doesn't look much different than the good drives.
> (sda shown, sdc)
>
> cruncher ~ # smartctl -A /dev/sdb
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 239 236 021 Pre-fail
> Always - 1016
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 24
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 100 100 000 Old_age
> Always - 87
> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 22
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
> Always - 12
> 193 Load_Cycle_Count 0x0032 200 200 000 Old_age
> Always - 11
> 194 Temperature_Celsius 0x0022 109 105 000 Old_age
> Always - 38
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age
> Offline - 0
>
> cruncher ~ # smartctl -A /dev/sda
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 239 235 021 Pre-fail
> Always - 1016
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 24
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 100 100 000 Old_age
> Always - 87
> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 22
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
> Always - 11
> 193 Load_Cycle_Count 0x0032 200 200 000 Old_age
> Always - 12
> 194 Temperature_Celsius 0x0022 108 106 000 Old_age
> Always - 39
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age
> Offline - 0
>
> cruncher ~ #
>
> Thanks,
> Mark
>
So hopefully the process used below is basically correct.
- Mark
cruncher ~ # man mdadm
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdc3[2] sda3[0]
52436096 blocks [3/2] [U_U]
md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
52436032 blocks [3/3] [UUU]
unused devices: <none>
cruncher ~ # mdadm /dev/md3 -f /dev/sdb3
mdadm: set device faulty failed for /dev/sdb3: No such device
cruncher ~ # mdadm /dev/md3 -r /dev/sdb3
mdadm: hot remove failed for /dev/sdb3: No such device or address
cruncher ~ # fdisk /dev/sdb
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): p
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x703d11ba
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 7 56196 83 Linux
/dev/sdb2 8 530 4200997+ 82 Linux swap / Solaris
/dev/sdb3 536 7063 52436160 fd Linux raid autodetect
/dev/sdb4 7064 60801 431650485 5 Extended
/dev/sdb5 7064 13591 52436128+ fd Linux raid autodetect
Command (m for help): q
cruncher ~ # mdadm /dev/md3 -a /dev/sdb3
mdadm: re-added /dev/sdb3
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdb3[3] sdc3[2] sda3[0]
52436096 blocks [3/2] [U_U]
[>....................] recovery = 1.3% (695488/52436096)
finish=8.6min speed=99355K/sec
md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
52436032 blocks [3/3] [UUU]
unused devices: <none>
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdb3[3] sdc3[2] sda3[0]
52436096 blocks [3/2] [U_U]
[===========>.........] recovery = 56.3% (29540736/52436096)
finish=5.0min speed=75950K/sec
md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
52436032 blocks [3/3] [UUU]
unused devices: <none>
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdb3[1] sdc3[2] sda3[0]
52436096 blocks [3/3] [UUU]
md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
52436032 blocks [3/3] [UUU]
unused devices: <none>
cruncher ~ #
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-04-14 2:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-13 22:54 First experience with drive being kicked Mark Knecht
2010-04-14 2:24 ` Mark Knecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).