* How do I tell which disk failed?
@ 2013-01-08 2:05 Ross Boylan
2013-01-08 5:19 ` Stan Hoeppner
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Ross Boylan @ 2013-01-08 2:05 UTC (permalink / raw)
To: linux-raid; +Cc: ross
I see my array is reconstructing, but I can't tell which disk failed.
Is there a way to? I tried mdadm --detail on the array, mdadm --examine
on the components, and looking at /proc/mdstat, but none of them give
much of a clue.
Disks have 0.90 metadata; mdadm - v2.6.7.2 - 14th November 2008, 2.6.32
kernel.
The machine (real, not virtual) hung, leaving few clues in the logs.
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
96256 blocks [3/3] [UUU]
md1 : active raid1 sda3[0] sdc4[2] sdb4[1]
730523648 blocks [3/3] [UUU]
[>....................] resync = 0.4% (3382400/730523648) finish=14164.9min speed=855K/sec
unused devices: <none>
# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90
Creation Time : Mon Dec 15 06:50:18 2008
Raid Level : raid1
Array Size : 730523648 (696.68 GiB 748.06 GB)
Used Dev Size : 730523648 (696.68 GiB 748.06 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Mon Jan 7 17:17:41 2013
State : active, resyncing
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Rebuild Status : 0% complete
UUID : b77027df:d6aa474a:c4290e12:319afc54
Events : 0.5078497
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 20 1 active sync /dev/sdb4
2 8 36 2 active sync /dev/sdc4
The system is currently sluggish and the load is 13; I suspect whatever went wrong is happening again.
Thanks.
Ross
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: How do I tell which disk failed? 2013-01-08 2:05 How do I tell which disk failed? Ross Boylan @ 2013-01-08 5:19 ` Stan Hoeppner 2013-01-08 6:59 ` Ross Boylan 2013-01-08 5:55 ` Chris Murphy 2013-01-08 9:55 ` Mikael Abrahamsson 2 siblings, 1 reply; 21+ messages in thread From: Stan Hoeppner @ 2013-01-08 5:19 UTC (permalink / raw) To: Ross Boylan; +Cc: linux-raid On 1/7/2013 8:05 PM, Ross Boylan wrote: > I see my array is reconstructing, but I can't tell which disk failed. > md0 : active raid1 sda1[0] sdc2[2] sdb2[1] > 96256 blocks [3/3] [UUU] > > md1 : active raid1 sda3[0] sdc4[2] sdb4[1] > 730523648 blocks [3/3] [UUU] Your two md/RAID1 arrays are built on partitions on the same set of 3 disks. You likely didn't have a disk failure, or md0 would be rebuilding as well. Your failure, or hiccup, is of some other nature, and apparently only affected md1. > [>....................] resync = 0.4% (3382400/730523648) finish=14164.9min speed=855K/sec Rebuilding a RAID1 on modern hardware should scream. You're getting resync throughput of less than 1MB/s. Estimated completion time is 9.8 _days_ to rebuild a mirror partition. This is insanely high. Either you've tweaked your resync throughput down to 1MB/s, or you have some other process(es) doing serious IO, robbing the resync of throughput. Consider running iotop to determine if another process(es) is eating IO bandwidth. -- Stan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 5:19 ` Stan Hoeppner @ 2013-01-08 6:59 ` Ross Boylan 2013-01-08 7:17 ` Chris Murphy 0 siblings, 1 reply; 21+ messages in thread From: Ross Boylan @ 2013-01-08 6:59 UTC (permalink / raw) To: stan; +Cc: linux-raid On Mon, 2013-01-07 at 23:19 -0600, Stan Hoeppner wrote: > On 1/7/2013 8:05 PM, Ross Boylan wrote: > > I see my array is reconstructing, but I can't tell which disk failed. > > > md0 : active raid1 sda1[0] sdc2[2] sdb2[1] > > 96256 blocks [3/3] [UUU] > > > > md1 : active raid1 sda3[0] sdc4[2] sdb4[1] > > 730523648 blocks [3/3] [UUU] > > Your two md/RAID1 arrays are built on partitions on the same set of 3 > disks. You likely didn't have a disk failure, or md0 would be > rebuilding as well. Your failure, or hiccup, is of some other nature, > and apparently only affected md1. I assume something went wrong while accessing one of the partitions, and that there is a problem with the disk that partition is on. Phrased more carefully, which partition failed and is being resynced into md1? I can't tell. If I knew, would it be safe to mdadm -fail that partition in the midst of the rebuild? Once the system starts md0 is almost never accessed (it's /boot). > > > [>....................] resync = 0.4% (3382400/730523648) finish=14164.9min speed=855K/sec > > Rebuilding a RAID1 on modern hardware should scream. You're getting > resync throughput of less than 1MB/s. Estimated completion time is 9.8 > _days_ to rebuild a mirror partition. This is insanely high. Yes. It seems to be doing better now: # date; cat /proc/mdstat Mon Jan 7 21:37:46 PST 2013 Personalities : [raid1] md0 : active raid1 sda1[0] sdc2[2] sdb2[1] 96256 blocks [3/3] [UUU] md1 : active raid1 sda3[0] sdc4[2] sdb4[1] 730523648 blocks [3/3] [UUU] [===========>.........] resync = 57.8% (422846976/730523648) finish=452.5min speed=11329K/sec unused devices: <none> This is more in line with what I remember when I originally synced the partitions, which I remember as 4-6 hours (it's clearly still much slower than that pace). > > Either you've tweaked your resync throughput down to 1MB/s, or you have > some other process(es) doing serious IO, robbing the resync of > throughput. Isn't it possible there's a hardware problem, e.g., leading to a failure/retry cycle? > Consider running iotop to determine if another process(es) > is eating IO bandwidth. I did, though it's probably a little late. Here's a fairly typical result (command line as shown on the last line) Total DISK READ: 99.09 K/s | Total DISK WRITE: 25.26 K/s PID USER DISK READ DISK WRITE SWAPIN IO COMMAND 4263 root 0 B/s 0 B/s 0.00 % 8.40 % [kjournald] 1204 root 99.09 K/s 0 B/s 0.00 % 4.68 % [kcopyd] 1193 root 0 B/s 0 B/s 0.00 % 4.68 % [kdmflush] 11874 root 0 B/s 25.26 K/s 0.00 % 0.00 % python /usr/bin/iotop -d 2 -n 20 -b When I restarted the system had been effectively down for ~ 1.5 days, and so I guess it's possible that lots of housekeeping operation was going on. However, top didn't show any noticeable use of CPU. A more recent check show speed continuing to rise; it the value is an average and it started slow that would explain it: date; cat /proc/mdstat Mon Jan 7 22:56:23 PST 2013 Personalities : [raid1] md0 : active raid1 sda1[0] sdc2[2] sdb2[1] 96256 blocks [3/3] [UUU] md1 : active raid1 sda3[0] sdc4[2] sdb4[1] 730523648 blocks [3/3] [UUU] [==================>..] resync = 91.8% (670929280/730523648) finish=19.4min speed=51057K/sec Ross ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 6:59 ` Ross Boylan @ 2013-01-08 7:17 ` Chris Murphy 2013-01-08 7:49 ` Ross Boylan 2013-01-08 7:59 ` Ross Boylan 0 siblings, 2 replies; 21+ messages in thread From: Chris Murphy @ 2013-01-08 7:17 UTC (permalink / raw) To: linux-raid@vger.kernel.org Raid On Jan 7, 2013, at 11:59 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote: >> > Isn't it possible there's a hardware problem, e.g., leading to a > failure/retry cycle? smartctl -a /dev/sda smartctl -a /dev/sdb smartctl -a /dev/sdc Compare them. If there was a write failure reported by the drive, md would have marked the device faulty. Chris Murphy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 7:17 ` Chris Murphy @ 2013-01-08 7:49 ` Ross Boylan 2013-01-08 8:48 ` Chris Murphy 2013-01-08 7:59 ` Ross Boylan 1 sibling, 1 reply; 21+ messages in thread From: Ross Boylan @ 2013-01-08 7:49 UTC (permalink / raw) To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid On Tue, 2013-01-08 at 00:17 -0700, Chris Murphy wrote: > On Jan 7, 2013, at 11:59 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > >> > > Isn't it possible there's a hardware problem, e.g., leading to a > > failure/retry cycle? > > smartctl -a /dev/sda > smartctl -a /dev/sdb > smartctl -a /dev/sdc > > Compare them. If there was a write failure reported by the drive, md would have marked the device faulty. SMART seems to think they are all OK, though my understanding of it is limited (e.g., the logs showed SMART reporting Temperature_Celsius of 110, but I think that's a normalized value for a raw of 42, meaning the temp is 42 degrees celsius). Do I need to manually run a test before the report reflects current conditions? At any rate, I did (just a short one), and the drives passed. The raw value (last column) for one of the parameters seems to be changing extremely rapidly, and perhaps is overflowing: # date; smartctl -a /dev/sda | grep 195 Mon Jan 7 23:11:03 PST 2013 195 Hardware_ECC_Recovered 0x001a 059 024 000 Old_age Always - 241377818 # date; smartctl -a /dev/sda | grep 195 Mon Jan 7 23:12:26 PST 2013 195 Hardware_ECC_Recovered 0x001a 056 024 000 Old_age Always - 3600778 Perhaps someone on this list can interpret that better than I. My thought was disk failure (not necessarily complete failure) -> system lockup. Continued disk flakiness leads to continued slowness after restart as, e.g., the disk keeps retrying operations that fail. I infer you have a different scenario in mind: the system freaks out for a reason unrelated to the disk. The resulting shutdown (which was a manual power off) leaves the arrays and their components in a funky state. When the system comes back, it fixes things up. Even if this did happen, in RAID 1 wouldn't some of the componnents (partitions in my case) be deemed good and others bad, with the latter resynced to match the former? And if that is happening, why can't I tell which partition(s) are master (considered good) and which are not (being overwritten with contents of the master)? The sync just completed, so I can no longer poke around while the rebuild is in process. Bad for learning and diagnosis, but good for almost every other purpose. Ross ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 7:49 ` Ross Boylan @ 2013-01-08 8:48 ` Chris Murphy 2013-01-08 9:32 ` Ross Boylan 0 siblings, 1 reply; 21+ messages in thread From: Chris Murphy @ 2013-01-08 8:48 UTC (permalink / raw) To: linux-raid@vger.kernel.org Raid On Jan 8, 2013, at 12:49 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > > The raw value (last column) for one of the parameters seems to be > changing extremely rapidly, and perhaps is overflowing: > # date; smartctl -a /dev/sda | grep 195 > Mon Jan 7 23:11:03 PST 2013 > 195 Hardware_ECC_Recovered 0x001a 059 024 000 Old_age Always - 241377818 > # date; smartctl -a /dev/sda | grep 195 > Mon Jan 7 23:12:26 PST 2013 > 195 Hardware_ECC_Recovered 0x001a 056 024 000 Old_age Always - 3600778 Not good. The current value is 56, the worst is 24, and the threshold is 0. These are high values. The firmware is doing its job in that it's fixing errors. But the fact it has to at this rate is not a good sign. Post sda's full attribute list. Is this disk under warranty? If it is I'd just get rid of it. Chris Murphy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 8:48 ` Chris Murphy @ 2013-01-08 9:32 ` Ross Boylan 2013-01-08 17:36 ` Chris Murphy 2013-01-08 22:30 ` Stan Hoeppner 0 siblings, 2 replies; 21+ messages in thread From: Ross Boylan @ 2013-01-08 9:32 UTC (permalink / raw) To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid On Tue, 2013-01-08 at 01:48 -0700, Chris Murphy wrote: > On Jan 8, 2013, at 12:49 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > The raw value (last column) for one of the parameters seems to be > > changing extremely rapidly, and perhaps is overflowing: > > # date; smartctl -a /dev/sda | grep 195 > > Mon Jan 7 23:11:03 PST 2013 > > 195 Hardware_ECC_Recovered 0x001a 059 024 000 Old_age Always - 241377818 > > # date; smartctl -a /dev/sda | grep 195 > > Mon Jan 7 23:12:26 PST 2013 > > 195 Hardware_ECC_Recovered 0x001a 056 024 000 Old_age Always - 3600778 > > Not good. The current value is 56, the worst is 24, and the threshold is 0. These are high values. Do you mean 56, 24, and 0 are high values? Or the raw values are high? Is the raw value wrapping around? > The firmware is doing its job in that it's fixing errors. But the fact it has to at this rate is not a good sign. Post sda's full attribute list. -------------------------------------------------------------------------------------------------------------------------------------------------- # date; smartctl -a /dev/sda Tue Jan 8 01:20:54 PST 2013 smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3750330NS Serial Number: 9QK1MBCW Firmware Version: SN05 User Capacity: 750,156,374,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Tue Jan 8 01:20:54 2013 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. # the next item seems peculiar. # It sounds as if the test was aborted, yet the rest result above is passed. Self-test execution status: ( 41) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: ( 642) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 177) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 075 063 044 Pre-fail Always - 38010669 3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 101 5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail Always - 31 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65563711282 9 Power_On_Hours 0x0032 061 061 000 Old_age Always - 34776 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 102 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 088 000 Old_age Always - 335 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 066 057 045 Old_age Always - 34 (Lifetime Min/Max 34/36) 194 Temperature_Celsius 0x0022 034 043 000 Old_age Always - 34 (0 18 0 0) 195 Hardware_ECC_Recovered 0x001a 044 024 000 Old_age Always - 38010669 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Interrupted (host reset) 00% 34773 - # 2 Extended offline Completed without error 00% 32464 - # 3 Extended offline Completed without error 00% 21385 - # 4 Short offline Completed without error 00% 20993 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ------------------------------------------------------------ > > Is this disk under warranty? If it is I'd just get rid of it. I think it's over 3 years old, so probably not in warranty; it might be if it's a 5 year warranty. Fortunately, I've already got new disks in the machine. The transition has proved challenging. I was more or less ready to go, but I wanted to do some experiments with the alignment of partitions and other parameters. Any suggestions would be great. Ross P.S. Here are the results for sdb, which has also been generating chatter in the logs. -------------------------------------------------------------------------------- on Jan 7 22:51:59 PST 2013 smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD2003FYYS-02W0B1 Serial Number: WD-WCAY00580447 Firmware Version: 01.01D02 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Jan 7 22:51:59 2013 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (29760) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 7725 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 14 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1676 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 13 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 8 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 5 194 Temperature_Celsius 0x0022 107 102 000 Old_age Always - 45 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1676 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ------------------------------------------------------------------------------------------------------ > > > Chris Murphy-- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 9:32 ` Ross Boylan @ 2013-01-08 17:36 ` Chris Murphy 2013-01-08 22:30 ` Stan Hoeppner 1 sibling, 0 replies; 21+ messages in thread From: Chris Murphy @ 2013-01-08 17:36 UTC (permalink / raw) To: Ross Boylan; +Cc: linux-raid@vger.kernel.org Raid On Jan 8, 2013, at 2:32 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > On Tue, 2013-01-08 at 01:48 -0700, Chris Murphy wrote: >>> >> >> Not good. The current value is 56, the worst is 24, and the threshold is 0. These are high values. > Do you mean 56, 24, and 0 are high values? Or the raw values are high? 0 is the point at which the drive will change its health from passing to failing. It's gotten as low as 24. So I'd say it's pre-failing, it just isn't telling you that literally. As raw values go up, the current value goes down. The closer current and threshold are, the worse the health of the drive for that particular attribute. It's actually a bit more complicated than that, there's lots of discussion of this on the smatmontools site. > Is the raw value wrapping around? No idea. > # date; smartctl -a /dev/sda > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 075 063 044 Pre-fail Always - 38010669 > 3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 101 > 5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail Always - 31 > 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65563711282 > 9 Power_On_Hours 0x0032 061 061 000 Old_age Always - 34776 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 102 > 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 > 188 Unknown_Attribute 0x0032 100 088 000 Old_age Always - 335 > 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 > 190 Airflow_Temperature_Cel 0x0022 066 057 045 Old_age Always - 34 (Lifetime Min/Max 34/36) > 194 Temperature_Celsius 0x0022 034 043 000 Old_age Always - 34 (0 18 0 0) > 195 Hardware_ECC_Recovered 0x001a 044 024 000 Old_age Always - 38010669 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 So there have been reallocated sectors, so some of them are bad. And since they tend to be located in groups, it probably explains why you had a slow initial rebuild that then sped up. Again, if it's under warranty, get rid of it. If it's not, well, I'd probably still get rid of it or use it for something inconsequential, after using hdparm to secure erase it (or use dd to write zeros, which is OK for HDDs, not OK for SSDs). > Fortunately, I've already got new disks in the machine. The transition > has proved challenging. > > I was more or less ready to go, but I wanted to do some experiments with > the alignment of partitions and other parameters. Any suggestions would > be great. You must've missed the other email I sent about alignment. The reds are not aligned. And you're using completely whacky partition sizes between sda and sd[bc] for reasons I don't understand. http://www.spinics.net/lists/raid/msg41506.html > P.S. Here are the results for sdb, which has also been generating > chatter in the logs. What do you mean by chatter in the logs? I don't see anything wrong here, but as something like 35% of drive failures occur without SMART ever indicating a single problem, who knows. Chris Murphy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 9:32 ` Ross Boylan 2013-01-08 17:36 ` Chris Murphy @ 2013-01-08 22:30 ` Stan Hoeppner 1 sibling, 0 replies; 21+ messages in thread From: Stan Hoeppner @ 2013-01-08 22:30 UTC (permalink / raw) To: Ross Boylan; +Cc: Chris Murphy, linux-raid@vger.kernel.org Raid On 1/8/2013 3:32 AM, Ross Boylan wrote: > # date; smartctl -a /dev/sda > Device Model: ST3750330NS > Serial Number: 9QK1MBCW > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 075 063 044 Pre-fail Always - 38010669 > 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65563711282 > 195 Hardware_ECC_Recovered 0x001a 044 024 000 Old_age Always - 38010669 This 750GB Seagate drive is FUBAR. Replace it as soon as possible. > Device Model: WDC WD2003FYYS-02W0B1 > Serial Number: WD-WCAY00580447 > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 This Western Digital appears to be fine. Please show log entries that lead you to believe it has problems. -- Stan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 7:17 ` Chris Murphy 2013-01-08 7:49 ` Ross Boylan @ 2013-01-08 7:59 ` Ross Boylan 2013-01-08 9:10 ` Chris Murphy 1 sibling, 1 reply; 21+ messages in thread From: Ross Boylan @ 2013-01-08 7:59 UTC (permalink / raw) To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid On Tue, 2013-01-08 at 00:17 -0700, Chris Murphy wrote: > On Jan 7, 2013, at 11:59 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > >> > > Isn't it possible there's a hardware problem, e.g., leading to a > > failure/retry cycle? > > smartctl -a /dev/sda > smartctl -a /dev/sdb > smartctl -a /dev/sdc > > Compare them. If there was a write failure reported by the drive, md would have marked the device faulty. > > Chris Murphy-- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html In response to your other query about the locations of the partitions: # parted /dev/sda unit s p select /dev/sdb p select /dev/sdc p Model: ATA ST3750330NS (scsi) Disk /dev/sda: 1465149168s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 63s 192779s 192717s primary ext3 boot, raid 2 192780s 4096574s 3903795s primary 3 4096575s 1465144064s 1461047490s primary raid Using /dev/sdb Model: ATA WDC WD2003FYYS-0 (scsi) Disk /dev/sdb: 3907029168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 34s 999999s 999966s extended boot loaders 2 1000000s 2929687s 1929688s ext3 /boot boot 3 2929688s 6835937s 3906250s swap 4 6835938s 3907029134s 3900193197s main Using /dev/sdc Model: ATA WDC WD2003FYYS-0 (scsi) Disk /dev/sdc: 3907029168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 34s 999999s 999966s extended boot loaders 2 1000000s 2929687s 1929688s ext3 boot boot 3 2929688s 6835937s 3906250s swap 4 6835938s 3907029134s 3900193197s main BTW the spec sheet for the WDC "red" drives says they use advanced formatting (I may not have the buzzword quite right) with physical sectors of 4k. So the reported sector size is a fib. Ross P.S. I didn't explicitly respond to Stan's comment that I might have tweaked my sync speed down. I haven't deliberately, though I suppose something could have done it behind my back. The increased performance as time passed does suggest something else was loading the io system, though it seems odd that happened without noticeable cpu use. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 7:59 ` Ross Boylan @ 2013-01-08 9:10 ` Chris Murphy 2013-01-08 21:54 ` Ross Boylan 0 siblings, 1 reply; 21+ messages in thread From: Chris Murphy @ 2013-01-08 9:10 UTC (permalink / raw) To: linux-raid@vger.kernel.org Raid On Jan 8, 2013, at 12:59 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > > Using /dev/sdb > Model: ATA WDC WD2003FYYS-0 (scsi) > Disk /dev/sdb: 3907029168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 34s 999999s 999966s extended boot loaders > 2 1000000s 2929687s 1929688s ext3 /boot boot > 3 2929688s 6835937s 3906250s swap > 4 6835938s 3907029134s 3900193197s main > > Using /dev/sdc > Model: ATA WDC WD2003FYYS-0 (scsi) > Disk /dev/sdc: 3907029168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 34s 999999s 999966s extended boot loaders > 2 1000000s 2929687s 1929688s ext3 boot boot > 3 2929688s 6835937s 3906250s swap > 4 6835938s 3907029134s 3900193197s main > > BTW the spec sheet for the WDC "red" drives says they use advanced > formatting (I may not have the buzzword quite right) with physical > sectors of 4k. So the reported sector size is a fib. Yeah you're using an old version of parted for it to not recognize that the physical sectors are 4096 bytes. The thing is, that it's a 512e disk, so the LBA's are still 512 bytes. And by the looks of it, your partitions are not aligned on those 4K physical sectors because the start value is 34s. In any recent fdisk or parted or gdisk, the start sector is 2048 (1MiB), and each partition is aligned on 8-sector boundaries. So your disks aren't properly partitioned, and you're getting a performance hit because of it. What I'm not getting is why your md0, comprised of sda1 at 192717s, and sd[bc]2 are 1929688s. What am I missing here? Because those values aren't at all the same. It's a 10x difference. And then with md1, comprised of sda3 at 1461047490s, and sd[bc]4 are 3900193197s. A 2.66x difference. What is this? sda1 is 696GiB, while sd[bc]4 are 1.8TiB each? Ummm… Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 9:10 ` Chris Murphy @ 2013-01-08 21:54 ` Ross Boylan 2013-01-08 22:38 ` Chris Murphy 2013-01-08 23:03 ` Stan Hoeppner 0 siblings, 2 replies; 21+ messages in thread From: Ross Boylan @ 2013-01-08 21:54 UTC (permalink / raw) To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid On Tue, 2013-01-08 at 02:10 -0700, Chris Murphy wrote: > On Jan 8, 2013, at 12:59 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > Using /dev/sdb > > Model: ATA WDC WD2003FYYS-0 (scsi) > > Disk /dev/sdb: 3907029168s > > Sector size (logical/physical): 512B/512B > > Partition Table: gpt > > > > Number Start End Size File system Name Flags > > 1 34s 999999s 999966s extended boot loaders > > 2 1000000s 2929687s 1929688s ext3 /boot boot > > 3 2929688s 6835937s 3906250s swap > > 4 6835938s 3907029134s 3900193197s main > > > > Using /dev/sdc > > Model: ATA WDC WD2003FYYS-0 (scsi) > > Disk /dev/sdc: 3907029168s > > Sector size (logical/physical): 512B/512B > > Partition Table: gpt > > > > Number Start End Size File system Name Flags > > 1 34s 999999s 999966s extended boot loaders > > 2 1000000s 2929687s 1929688s ext3 boot boot > > 3 2929688s 6835937s 3906250s swap > > 4 6835938s 3907029134s 3900193197s main > > > > BTW the spec sheet for the WDC "red" drives says they use advanced > > formatting (I may not have the buzzword quite right) with physical > > sectors of 4k. So the reported sector size is a fib. > > Yeah you're using an old version of parted for it to not recognize that the physical sectors are 4096 bytes. The thing is, that it's a 512e disk, so the LBA's are still 512 bytes. And by the looks of it, your partitions are not aligned on those 4K physical sectors because the start value is 34s. In any recent fdisk or parted or gdisk, the start sector is 2048 (1MiB), and each partition is aligned on 8-sector boundaries. So your disks aren't properly partitioned, and you're getting a performance hit because of it. > > What I'm not getting is why your md0, comprised of sda1 at 192717s, and sd[bc]2 are 1929688s. What am I missing here? Because those values aren't at all the same. It's a 10x difference. I'm migrating the array from an old, smaller disk (it was a pair of disks, but I've already pulled one) to newer larger disks. Eventually the current sda will go away (I was going to keep using it, but given recent problems, as you suggest, best to ditch it) and the RAID arrays willl grow to fill the new space. I manually specified the current layout of the bigger disks (sdb and c); at least some of the time I specified the exact sector. I picked 34 because that seems to be the traditional offset for the first partition (and the one my tool generated when I gave it sizes in grosser units than sectors or told it to start at 0). Apparently some disks do a logical to physical remap that includes an offset as well as a change in the sector size. Should I check for that, or should I just assume that I should start my partitions on sectors that are multiples of 8? You also asked what I meant by chatter in the logs about sdb. Here are some entries from shortly before the system locked up: Jan 6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 65 Jan 6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35 Jan 6 03:45:25 markov smartd[5368]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 109 I am less excited about that since discovering the message about sdb does not mean it's running at over 100 degrees celsius (the raw value is around 45). The logs from the restart show Jan 7 17:19:09 markov kernel: [ 2.928055] ata2.00: SATA link down (SStatus 0 SControl 0) Jan 7 17:19:09 markov kernel: [ 2.928102] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 7 17:19:09 markov kernel: [ 2.944459] ata2.01: ATA-8: WDC WD2003FYYS-02W0B1, 01.01D02, max UDMA/133 Jan 7 17:19:09 markov kernel: [ 2.944498] ata2.01: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32) Jan 7 17:19:09 markov kernel: [ 2.952486] ata2.01: configured for UDMA/133 Jan 7 17:19:09 markov kernel: [ 2.952642] scsi 1:0:1:0: Direct-Access ATA WDC WD2003FYYS-0 01.0 PQ: 0 ANSI: 5 Jan 7 17:19:09 markov kernel: [ 2.952918] scsi 2:0:0:0: Direct-Access ATA WDC WD2003FYYS-0 01.0 PQ: 0 ANSI: 5 Jan 7 17:19:09 markov kernel: [ 2.953695] scsi 3:0:0:0: CD-ROM TSSTcorp CDDVDW SH-S223B SB00 PQ: 0 ANSI: 5 Jan 7 17:19:09 markov kernel: [ 3.289403] md: md0 stopped. Jan 7 17:19:09 markov kernel: [ 3.328423] md: md1 stopped. Jan 7 17:19:09 markov kernel: [ 3.382868] md: bind<sdb4> Jan 7 17:19:09 markov kernel: [ 3.383054] md: bind<sdc4> Jan 7 17:19:09 markov kernel: [ 3.383347] md: bind<sda3> Jan 7 17:19:09 markov kernel: [ 3.390925] raid1: md1 is not clean -- starting background reconstruction Jan 7 17:19:09 markov kernel: [ 3.390963] raid1: raid set md1 active with 3 out of 3 mirrors Jan 7 17:19:09 markov kernel: [ 3.391016] md1: detected capacity change from 0 to 748056215552 Jan 7 17:19:09 markov kernel: [ 3.391169] md1: unknown partition table Jan 7 17:19:09 markov kernel: [ 2.220056] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 7 17:19:09 markov kernel: [ 2.220103] ata1.01: SATA link down (SStatus 0 SControl 310) Jan 7 17:19:09 markov kernel: [ 2.228670] ata1.00: ATA-8: ST3750330NS, SN05, max UDMA/133 Jan 7 17:19:09 markov kernel: [ 2.228709] ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32) Jan 7 17:19:09 markov kernel: [ 2.244690] ata1.00: configured for UDMA/133 Jan 7 17:19:09 markov kernel: [ 2.244845] scsi 0:0:0:0: Direct-Access ATA ST3750330NS SN05 PQ: 0 ANSI: 5 Aside from the message that md1 isn't clean, the SATA link down messages sound a little odd. I'm not sure how to map from atax to disk, but ata2 seems to be one of the new disks (sdb or sdc) and ata1 is the old one (sda). /dev/disk/by-path shows lrwxrwxrwx 1 root root 9 2013-01-07 17:15 pci-0000:00:1f.2-scsi-0:0:0:0 -> ../../sda lrwxrwxrwx 1 root root 9 2013-01-07 17:15 pci-0000:00:1f.2-scsi-1:0:1:0 -> ../../sdb lrwxrwxrwx 1 root root 9 2013-01-07 17:15 pci-0000:00:1f.5-scsi-0:0:0:0 -> ../../sdc Ross > > And then with md1, comprised of sda3 at 1461047490s, and sd[bc]4 are 3900193197s. A 2.66x difference. What is this? sda1 is 696GiB, while sd[bc]4 are 1.8TiB each? Ummm… > > > > > Chris Murphy-- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 21:54 ` Ross Boylan @ 2013-01-08 22:38 ` Chris Murphy 2013-01-08 23:13 ` Ross Boylan 2013-01-08 23:03 ` Stan Hoeppner 1 sibling, 1 reply; 21+ messages in thread From: Chris Murphy @ 2013-01-08 22:38 UTC (permalink / raw) To: linux-raid@vger.kernel.org Raid On Jan 8, 2013, at 2:54 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > > I manually specified the current layout of the bigger disks (sdb and c); > at least some of the time I specified the exact sector. I picked 34 > because that seems to be the traditional offset for the first partition > (and the one my tool generated when I gave it sizes in grosser units > than sectors or told it to start at 0). Today 34 is both old and incorrect, so you need to redo the layout. > Apparently some disks do a logical to physical remap that includes an > offset as well as a change in the sector size. Should I check for that, > or should I just assume that I should start my partitions on sectors > that are multiples of 8? I know of no disks that change the sector size. It's always 512 logical, 4096 physical for reds. There are supposed to be native 4Kn drives between now and soon, but they aren't switchable between 512e and 4Kn. As for the offset, that still won't work because it'll change the position of your partition map so you'd have to start over anyway, even if it were available, which I don't think it is on a red. So you just need to use a more recent partition tool and repartition the disks correctly. > > You also asked what I meant by chatter in the logs about sdb. Here are > some entries from shortly before the system locked up: > Jan 6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 65 > Jan 6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35 > Jan 6 03:45:25 markov smartd[5368]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 109 smartmontools 5.38 is old, and this red drive isn't in its database, so the data may be interpreted incorrectly. 108 C is very hot. But I wouldn't totally discount it when the drives are all busy on a resync, if you get wildly different Raw_Values for this attribute between sdb and sdc since they're the same drive model. Chris Murphy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 22:38 ` Chris Murphy @ 2013-01-08 23:13 ` Ross Boylan 2013-01-09 0:43 ` Chris Murphy 0 siblings, 1 reply; 21+ messages in thread From: Ross Boylan @ 2013-01-08 23:13 UTC (permalink / raw) To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid On Tue, 2013-01-08 at 15:38 -0700, Chris Murphy wrote: > On Jan 8, 2013, at 2:54 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > I manually specified the current layout of the bigger disks (sdb and c); > > at least some of the time I specified the exact sector. I picked 34 > > because that seems to be the traditional offset for the first partition > > (and the one my tool generated when I gave it sizes in grosser units > > than sectors or told it to start at 0). > > Today 34 is both old and incorrect, so you need to redo the layout. > > > Apparently some disks do a logical to physical remap that includes an > > offset as well as a change in the sector size. Should I check for that, > > or should I just assume that I should start my partitions on sectors > > that are multiples of 8? > > I know of no disks that change the sector size. It's always 512 logical, 4096 physical for reds. There are supposed to be native 4Kn drives between now and soon, but they aren't switchable between 512e and 4Kn. As for the offset, that still won't work because it'll change the position of your partition map so you'd have to start over anyway, even if it were available, which I don't think it is on a red. > I didn't mean that the disk changed its sector size dynamically, just that, e.g., it might have physical sectors of 4k but report that it has (logical) sectors of 512. I'm not sure what you mean by the offset working. I'm referring to the fact that for some drives when you ask for logical sector n you actually get physical sector n+1, n-2, or something like that. This implies that aligning on the logical sectors (meaning the ones the drive reports out) might misalign on the physical ones. > So you just need to use a more recent partition tool and repartition the disks correctly. Correctly = start at multiples of 8? > > > > > > > > > You also asked what I meant by chatter in the logs about sdb. Here are > > some entries from shortly before the system locked up: > > Jan 6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 65 > > Jan 6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35 > > Jan 6 03:45:25 markov smartd[5368]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 109 > > smartmontools 5.38 is old, and this red drive isn't in its database, so the data may be interpreted incorrectly. 108 C is very hot. But I wouldn't totally discount it when the drives are all busy on a resync, if you get wildly different Raw_Values for this attribute between sdb and sdc since they're the same drive model. > That report was from before the system crash, when it was probably doing very little, although disk intensive maintenance such as backups or indexing the mail spool might have been happening. I thought 108 was the scaled smart score, which is between 0 and 255 with higher being better. The raw value of 45 seemed more plausible as an actual temperature, though I guess there's no guarantee of that. sdb and sdc have similar numbers for Temperature_Celsius. On the logs and sign of disk failure, it's quite possible I don't know what I'm looking for. Given their size and the fact that the drive failure seems clear, I think I'll spare you all the gory details. Ross ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 23:13 ` Ross Boylan @ 2013-01-09 0:43 ` Chris Murphy 0 siblings, 0 replies; 21+ messages in thread From: Chris Murphy @ 2013-01-09 0:43 UTC (permalink / raw) To: linux-raid@vger.kernel.org Raid On Jan 8, 2013, at 4:13 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > I didn't mean that the disk changed its sector size dynamically, just > that, e.g., it might have physical sectors of 4k but report that it has > (logical) sectors of 512. Reds are AF drives. Any 512e AF drive should be reported as having 512 bytes logical sector size, and 4096 byte physical sector size. > I'm not sure what you mean by the offset working. I'm referring to the > fact that for some drives when you ask for logical sector n you actually > get physical sector n+1, n-2, or something like that. This implies that > aligning on the logical sectors (meaning the ones the drive reports out) > might misalign on the physical ones. There are some drives floating around that have a jumper switch, targeted at Windows XP and older, that will do an offset like what you describe. The jumper isn't enabled by default, and you don't want to use it. >> So you just need to use a more recent partition tool and repartition the disks correctly. > Correctly = start at multiples of 8? Don't think of it that way. You can come up with a partition sector start value divisible by 8 that is right in the middle of physical sector, which is what you don't want. Recent partitioning tools (i.e in the last 3 years at least), do the right thing if you don't 2nd guess them. First partition starts at 2048. Specify partition sizes in MiB. Now you won't have a problem. > > I thought 108 was the scaled smart score, which is between 0 and 255 > with higher being better. The raw value of 45 seemed more plausible as > an actual temperature, though I guess there's no guarantee of that. Yes. > > sdb and sdc have similar numbers for Temperature_Celsius. > > On the logs and sign of disk failure, it's quite possible I don't know > what I'm looking for. Given their size and the fact that the drive > failure seems clear, I think I'll spare you all the gory details. I think you just have the one disk that's giving you trouble. Chris Murphy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 21:54 ` Ross Boylan 2013-01-08 22:38 ` Chris Murphy @ 2013-01-08 23:03 ` Stan Hoeppner 1 sibling, 0 replies; 21+ messages in thread From: Stan Hoeppner @ 2013-01-08 23:03 UTC (permalink / raw) To: Ross Boylan; +Cc: Chris Murphy, linux-raid@vger.kernel.org Raid On 1/8/2013 3:54 PM, Ross Boylan wrote: > I am less excited about that since discovering the message about sdb > does not mean it's running at over 100 degrees celsius (the raw value is > around 45). You must ignore the VALUE and WORST columns for drive temp. These are "normalized" values only the smartmon idiots understand. The actual temp of 45C is a bit high, but well within the operating range for that drive. The WDC drives have a max temp (failure) of 80C IIRC, and a normal max operating temp of 65C. So you don't need to worry about this drive's temp. > The logs from the restart show > Jan 7 17:19:09 markov kernel: [ 2.928055] ata2.00: SATA link down (SStatus 0 SControl 0) > Jan 7 17:19:09 markov kernel: [ 2.928102] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > Jan 7 17:19:09 markov kernel: [ 2.944459] ata2.01: ATA-8: WDC WD2003FYYS-02W0B1, 01.01D02, max UDMA/133 > Jan 7 17:19:09 markov kernel: [ 2.220056] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > Jan 7 17:19:09 markov kernel: [ 2.220103] ata1.01: SATA link down (SStatus 0 SControl 310) > Jan 7 17:19:09 markov kernel: [ 2.228670] ata1.00: ATA-8: ST3750330NS, SN05, max UDMA/133 > the SATA link down messages > sound a little odd. No mystery here. These ports (links) are down because no drives are connected to them, apparently. Show full dmesg output, and tell us the SAS/SATA controller and port count on each for the system in question. -- Stan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 2:05 How do I tell which disk failed? Ross Boylan 2013-01-08 5:19 ` Stan Hoeppner @ 2013-01-08 5:55 ` Chris Murphy 2013-01-08 9:55 ` Mikael Abrahamsson 2 siblings, 0 replies; 21+ messages in thread From: Chris Murphy @ 2013-01-08 5:55 UTC (permalink / raw) To: linux-raid@vger.kernel.org Raid On Jan 7, 2013, at 7:05 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote: > > Personalities : [raid1] > md0 : active raid1 sda1[0] sdc2[2] sdb2[1] > > > md1 : active raid1 sda3[0] sdc4[2] sdb4[1] fdisk -l ? Where are sda2, sdc1, sdc3, sdb1, sdb3? Chris Murphy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 2:05 How do I tell which disk failed? Ross Boylan 2013-01-08 5:19 ` Stan Hoeppner 2013-01-08 5:55 ` Chris Murphy @ 2013-01-08 9:55 ` Mikael Abrahamsson 2013-01-08 17:20 ` Ross Boylan 2 siblings, 1 reply; 21+ messages in thread From: Mikael Abrahamsson @ 2013-01-08 9:55 UTC (permalink / raw) To: Ross Boylan; +Cc: linux-raid On Mon, 7 Jan 2013, Ross Boylan wrote: > I see my array is reconstructing, but I can't tell which disk failed. > Is there a way to? I tried mdadm --detail on the array, mdadm --examine You should look into the kernel logs, "dmesg" might tell you if there hasn't been much other log activity lately, otherwise you have to check the logfiles. It will log any fail events there. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 9:55 ` Mikael Abrahamsson @ 2013-01-08 17:20 ` Ross Boylan 2013-01-08 21:24 ` pg_mh, Peter Grandi 2013-01-08 22:34 ` Stan Hoeppner 0 siblings, 2 replies; 21+ messages in thread From: Ross Boylan @ 2013-01-08 17:20 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: ross, linux-raid On Tue, 2013-01-08 at 10:55 +0100, Mikael Abrahamsson wrote: > On Mon, 7 Jan 2013, Ross Boylan wrote: > > > I see my array is reconstructing, but I can't tell which disk failed. > > Is there a way to? I tried mdadm --detail on the array, mdadm --examine > > You should look into the kernel logs, "dmesg" might tell you if there > hasn't been much other log activity lately, otherwise you have to check > the logfiles. It will log any fail events there. I checked the logs and didn't see anything about a drive failing, though there were some smartd reports of changes in drive parameters like temperature. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 17:20 ` Ross Boylan @ 2013-01-08 21:24 ` pg_mh, Peter Grandi 2013-01-08 22:34 ` Stan Hoeppner 1 sibling, 0 replies; 21+ messages in thread From: pg_mh, Peter Grandi @ 2013-01-08 21:24 UTC (permalink / raw) To: Linux RAID [ ... ] >>> Personalities : [raid1] >>> md0 : active raid1 sda1[0] sdc2[2] sdb2[1] >>> 96256 blocks [3/3] [UUU] >>> >>> md1 : active raid1 sda3[0] sdc4[2] sdb4[1] >>> 730523648 blocks [3/3] [UUU] >>> [>....................] resync = 0.4% (3382400/730523648) finish=14164.9min speed=855K/sec >>> I see my array is reconstructing, but I can't tell which >>> disk failed. [ ... ] The system is currently sluggish and >>> the load is 13 [ ... ] If your kernel is one that puts IO wait in the load average that's expected if there is heavy IO load that makes resync slow. >> A more recent check show speed continuing to rise; [ ... ] Perhaps because the 'fsck' ended, as the speed issue is likely to have been been a long 'fsck', consequent to an abrupt shutdown: >> [ ... ] The resulting shutdown (which was a manual power >> off) leaves the arrays and their components in a funky state. >> When the system comes back, it fixes things up. [ ... ] Plus the poor alignment of the 'sda' partitions cutting write rates very significantly. Your 'sd[bc]' disks instead are GPT partitioned and that is by default 1MiB aligned, but you probably used some very old tool and 'sd[bc]4' are 1KiB aligned: $ factor 6835938 6835938: 2 3 17 29 2311 Someone else has pointed out the large difference in partition sizes among 'sda' vs. 'sd[bc]'; while that does not cause speed issue, the RAID set will just reduce to the multiple of the smallest size. Indeed it is reported as 730m blocks, which is the equivalent of 1461047490s reported by 'fdisk' for 'sda3'. Probably you should have a 2-disk RAID1 of 'sd[bc]' alone. >> Even if this did happen, in RAID 1 wouldn't some of the >> componnents (partitions in my case) be deemed good and others >> bad, with the latter resynced to match the former? And if >> that is happening, why can't I tell which partition(s) are >> master (considered good) and which are not Because you haven't read some relevant documentation... >> (being overwritten with contents of the master)? Two ways, for example: * The "event counts" reported by will be different (higher event count means more recent). * 'iostat' will tell you which drives are being read and which written. > I checked the logs and didn't see anything about a drive > failing, though there were some smartd reports of changes in > drive parameters like temperature. The kernel logs always tell if a resync is triggered by a failure, but note that a resync happens on a failure when a spare is added to the RAID set to replace the failed drive, or when the drives are out of sync because of an abrupt shutdown, which seems to be your case. Anyhow the ways to look at the health of the disk suggested by others are somewhat misleading. The first thing is to have a mental model of possible disk failure modes... Anyhow, the most relevant data are in 'smartctl -A' the number of reallocated sectors (too many indicates a failing disk) and the SMART selftest and error logs, to check the frequency of issues. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: How do I tell which disk failed? 2013-01-08 17:20 ` Ross Boylan 2013-01-08 21:24 ` pg_mh, Peter Grandi @ 2013-01-08 22:34 ` Stan Hoeppner 1 sibling, 0 replies; 21+ messages in thread From: Stan Hoeppner @ 2013-01-08 22:34 UTC (permalink / raw) To: Ross Boylan; +Cc: Mikael Abrahamsson, linux-raid On 1/8/2013 11:20 AM, Ross Boylan wrote: > I checked the logs and didn't see anything about a drive failing I'd guess you don't know what you're looking for. If you post dmesg output to the list we'll find it. Although, given the S.M.A.R.T data for the Seagate drive, it's probably unnecessary, as we know it's FUBAR. -- Stan ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2013-01-09 0:43 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-08 2:05 How do I tell which disk failed? Ross Boylan 2013-01-08 5:19 ` Stan Hoeppner 2013-01-08 6:59 ` Ross Boylan 2013-01-08 7:17 ` Chris Murphy 2013-01-08 7:49 ` Ross Boylan 2013-01-08 8:48 ` Chris Murphy 2013-01-08 9:32 ` Ross Boylan 2013-01-08 17:36 ` Chris Murphy 2013-01-08 22:30 ` Stan Hoeppner 2013-01-08 7:59 ` Ross Boylan 2013-01-08 9:10 ` Chris Murphy 2013-01-08 21:54 ` Ross Boylan 2013-01-08 22:38 ` Chris Murphy 2013-01-08 23:13 ` Ross Boylan 2013-01-09 0:43 ` Chris Murphy 2013-01-08 23:03 ` Stan Hoeppner 2013-01-08 5:55 ` Chris Murphy 2013-01-08 9:55 ` Mikael Abrahamsson 2013-01-08 17:20 ` Ross Boylan 2013-01-08 21:24 ` pg_mh, Peter Grandi 2013-01-08 22:34 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).