How do I tell which disk failed?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How do I tell which disk failed?
@ 2013-01-08  2:05 Ross Boylan
  2013-01-08  5:19 ` Stan Hoeppner
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ross Boylan @ 2013-01-08  2:05 UTC (permalink / raw)
  To: linux-raid; +Cc: ross

I see my array is reconstructing, but I can't tell which disk failed.
Is there a way to?  I tried mdadm --detail on the array, mdadm --examine
on the components, and looking at /proc/mdstat, but none of them give
much of a clue.

Disks have 0.90 metadata; mdadm - v2.6.7.2 - 14th November 2008, 2.6.32
kernel.

The machine (real, not virtual) hung, leaving few clues in the logs.

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
      96256 blocks [3/3] [UUU]

md1 : active raid1 sda3[0] sdc4[2] sdb4[1]
      730523648 blocks [3/3] [UUU]
      [>....................]  resync =  0.4% (3382400/730523648) finish=14164.9min speed=855K/sec

unused devices: <none>

# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90
  Creation Time : Mon Dec 15 06:50:18 2008
     Raid Level : raid1
     Array Size : 730523648 (696.68 GiB 748.06 GB)
  Used Dev Size : 730523648 (696.68 GiB 748.06 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Jan  7 17:17:41 2013
          State : active, resyncing
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Rebuild Status : 0% complete

           UUID : b77027df:d6aa474a:c4290e12:319afc54
         Events : 0.5078497

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       20        1      active sync   /dev/sdb4
       2       8       36        2      active sync   /dev/sdc4

The system is currently sluggish and the load is 13; I suspect whatever went wrong is happening again.

Thanks.
Ross



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  2:05 How do I tell which disk failed? Ross Boylan
@ 2013-01-08  5:19 ` Stan Hoeppner
  2013-01-08  6:59   ` Ross Boylan
  2013-01-08  5:55 ` Chris Murphy
  2013-01-08  9:55 ` Mikael Abrahamsson
  2 siblings, 1 reply; 21+ messages in thread
From: Stan Hoeppner @ 2013-01-08  5:19 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

On 1/7/2013 8:05 PM, Ross Boylan wrote:
> I see my array is reconstructing, but I can't tell which disk failed.

> md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
>       96256 blocks [3/3] [UUU]
> 
> md1 : active raid1 sda3[0] sdc4[2] sdb4[1]
>       730523648 blocks [3/3] [UUU]

Your two md/RAID1 arrays are built on partitions on the same set of 3
disks.  You likely didn't have a disk failure, or md0 would be
rebuilding as well.  Your failure, or hiccup, is of some other nature,
and apparently only affected md1.

>       [>....................]  resync =  0.4% (3382400/730523648) finish=14164.9min speed=855K/sec

Rebuilding a RAID1 on modern hardware should scream.  You're getting
resync throughput of less than 1MB/s.  Estimated completion time is 9.8
_days_ to rebuild a mirror partition.  This is insanely high.

Either you've tweaked your resync throughput down to 1MB/s, or you have
some other process(es) doing serious IO, robbing the resync of
throughput.  Consider running iotop to determine if another process(es)
is eating IO bandwidth.

-- 
Stan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  5:19 ` Stan Hoeppner
@ 2013-01-08  6:59   ` Ross Boylan
  2013-01-08  7:17     ` Chris Murphy
  0 siblings, 1 reply; 21+ messages in thread
From: Ross Boylan @ 2013-01-08  6:59 UTC (permalink / raw)
  To: stan; +Cc: linux-raid

On Mon, 2013-01-07 at 23:19 -0600, Stan Hoeppner wrote:
> On 1/7/2013 8:05 PM, Ross Boylan wrote:
> > I see my array is reconstructing, but I can't tell which disk failed.
> 
> > md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
> >       96256 blocks [3/3] [UUU]
> > 
> > md1 : active raid1 sda3[0] sdc4[2] sdb4[1]
> >       730523648 blocks [3/3] [UUU]
> 
> Your two md/RAID1 arrays are built on partitions on the same set of 3
> disks.  You likely didn't have a disk failure, or md0 would be
> rebuilding as well.  Your failure, or hiccup, is of some other nature,
> and apparently only affected md1.
I assume something went wrong while accessing one of the partitions, and
that there is a problem with the disk that partition is on.

Phrased more carefully, which partition failed and is being resynced
into md1?  I can't tell. 

If I knew, would it be safe to mdadm -fail that partition in the midst
of the rebuild?  

Once the system starts md0 is almost never accessed (it's /boot).

> 
> >       [>....................]  resync =  0.4% (3382400/730523648) finish=14164.9min speed=855K/sec
> 
> Rebuilding a RAID1 on modern hardware should scream.  You're getting
> resync throughput of less than 1MB/s.  Estimated completion time is 9.8
> _days_ to rebuild a mirror partition.  This is insanely high.
Yes.  It seems to be doing better now:
# date; cat /proc/mdstat
Mon Jan  7 21:37:46 PST 2013
Personalities : [raid1]
md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
      96256 blocks [3/3] [UUU]

md1 : active raid1 sda3[0] sdc4[2] sdb4[1]
      730523648 blocks [3/3] [UUU]
      [===========>.........]  resync = 57.8% (422846976/730523648) finish=452.5min speed=11329K/sec

unused devices: <none>

This is more in line with what I remember when I originally synced the
partitions, which I remember as 4-6 hours (it's clearly still much
slower than that pace).

> 
> Either you've tweaked your resync throughput down to 1MB/s, or you have
> some other process(es) doing serious IO, robbing the resync of
> throughput.  
Isn't it possible there's a hardware problem, e.g., leading to a
failure/retry cycle?

> Consider running iotop to determine if another process(es)
> is eating IO bandwidth.
I did, though it's probably a little late.  Here's a fairly typical result (command line as shown on the
last line)
Total DISK READ: 99.09 K/s | Total DISK WRITE: 25.26 K/s
  PID USER      DISK READ  DISK WRITE   SWAPIN    IO    COMMAND
 4263 root           0 B/s       0 B/s  0.00 %  8.40 % [kjournald]
 1204 root       99.09 K/s       0 B/s  0.00 %  4.68 % [kcopyd]
 1193 root           0 B/s       0 B/s  0.00 %  4.68 % [kdmflush]
11874 root           0 B/s   25.26 K/s  0.00 %  0.00 % python /usr/bin/iotop -d 2 -n 20 -b

When I restarted the system had been effectively down for ~ 1.5 days,
and so I guess it's possible that lots of housekeeping operation was
going on.  However, top didn't show any noticeable use of CPU.

A more recent check show speed continuing to rise; it the value is an
average and it started slow that would explain it:
 date; cat /proc/mdstat
Mon Jan  7 22:56:23 PST 2013
Personalities : [raid1]
md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
      96256 blocks [3/3] [UUU]

md1 : active raid1 sda3[0] sdc4[2] sdb4[1]
      730523648 blocks [3/3] [UUU]
      [==================>..]  resync = 91.8% (670929280/730523648) finish=19.4min speed=51057K/sec

Ross



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  6:59   ` Ross Boylan
@ 2013-01-08  7:17     ` Chris Murphy
  2013-01-08  7:49       ` Ross Boylan
  2013-01-08  7:59       ` Ross Boylan
  0 siblings, 2 replies; 21+ messages in thread
From: Chris Murphy @ 2013-01-08  7:17 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org Raid


On Jan 7, 2013, at 11:59 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
>> 
> Isn't it possible there's a hardware problem, e.g., leading to a
> failure/retry cycle?

smartctl -a /dev/sda
smartctl -a /dev/sdb
smartctl -a /dev/sdc

Compare them. If there was a write failure reported by the drive, md would have marked the device faulty.

Chris Murphy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  7:17     ` Chris Murphy
@ 2013-01-08  7:49       ` Ross Boylan
  2013-01-08  8:48         ` Chris Murphy
  2013-01-08  7:59       ` Ross Boylan
  1 sibling, 1 reply; 21+ messages in thread
From: Ross Boylan @ 2013-01-08  7:49 UTC (permalink / raw)
  To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid

On Tue, 2013-01-08 at 00:17 -0700, Chris Murphy wrote:
> On Jan 7, 2013, at 11:59 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> >> 
> > Isn't it possible there's a hardware problem, e.g., leading to a
> > failure/retry cycle?
> 
> smartctl -a /dev/sda
> smartctl -a /dev/sdb
> smartctl -a /dev/sdc
> 
> Compare them. If there was a write failure reported by the drive, md would have marked the device faulty.
SMART seems to think they are all OK, though my understanding of it is
limited (e.g., the logs showed SMART reporting Temperature_Celsius of
110, but I think that's a normalized value for a raw of 42, meaning the
temp is 42 degrees celsius).  Do I need to manually run a test before
the report reflects current conditions?  At any rate, I did (just a
short one), and the drives passed.

The raw value (last column) for one of the parameters seems to be
changing extremely rapidly, and perhaps is overflowing:
# date; smartctl -a /dev/sda | grep 195
Mon Jan  7 23:11:03 PST 2013
195 Hardware_ECC_Recovered  0x001a   059   024   000    Old_age   Always       -       241377818
# date; smartctl -a /dev/sda | grep 195
Mon Jan  7 23:12:26 PST 2013
195 Hardware_ECC_Recovered  0x001a   056   024   000    Old_age   Always       -       3600778
Perhaps someone on this list can interpret that better than I.

My thought was disk failure (not necessarily complete failure) -> system
lockup.  Continued disk flakiness leads to continued slowness after
restart as, e.g., the disk keeps retrying operations that fail.

I infer you have a different scenario in mind: the system freaks out for
a reason unrelated to the disk.  The resulting shutdown (which was a
manual power off) leaves the arrays and their components in a funky
state.  When the system comes back, it fixes things up.

Even if this did happen, in RAID 1 wouldn't some of the componnents
(partitions in my case) be deemed good and others bad, with the latter
resynced to match the former?  And if that is happening, why can't I
tell which partition(s) are master (considered good) and which are not
(being overwritten with contents of the master)?

The sync just completed, so I can no longer poke around while the
rebuild is in process.  Bad for learning and diagnosis, but good for
almost every other purpose.

Ross

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  7:49       ` Ross Boylan
@ 2013-01-08  8:48         ` Chris Murphy
  2013-01-08  9:32           ` Ross Boylan
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Murphy @ 2013-01-08  8:48 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org Raid


On Jan 8, 2013, at 12:49 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> The raw value (last column) for one of the parameters seems to be
> changing extremely rapidly, and perhaps is overflowing:
> # date; smartctl -a /dev/sda | grep 195
> Mon Jan  7 23:11:03 PST 2013
> 195 Hardware_ECC_Recovered  0x001a   059   024   000    Old_age   Always       -       241377818
> # date; smartctl -a /dev/sda | grep 195
> Mon Jan  7 23:12:26 PST 2013
> 195 Hardware_ECC_Recovered  0x001a   056   024   000    Old_age   Always       -       3600778

Not good. The current value is 56, the worst is 24, and the threshold is 0. These are high values. The firmware is doing its job in that it's fixing errors. But the fact it has to at this rate is not a good sign. Post sda's full attribute list.

Is this disk under warranty? If it is I'd just get rid of it.


Chris Murphy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  8:48         ` Chris Murphy
@ 2013-01-08  9:32           ` Ross Boylan
  2013-01-08 17:36             ` Chris Murphy
  2013-01-08 22:30             ` Stan Hoeppner
  0 siblings, 2 replies; 21+ messages in thread
From: Ross Boylan @ 2013-01-08  9:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid

On Tue, 2013-01-08 at 01:48 -0700, Chris Murphy wrote:
> On Jan 8, 2013, at 12:49 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > 
> > The raw value (last column) for one of the parameters seems to be
> > changing extremely rapidly, and perhaps is overflowing:
> > # date; smartctl -a /dev/sda | grep 195
> > Mon Jan  7 23:11:03 PST 2013
> > 195 Hardware_ECC_Recovered  0x001a   059   024   000    Old_age   Always       -       241377818
> > # date; smartctl -a /dev/sda | grep 195
> > Mon Jan  7 23:12:26 PST 2013
> > 195 Hardware_ECC_Recovered  0x001a   056   024   000    Old_age   Always       -       3600778
> 
> Not good. The current value is 56, the worst is 24, and the threshold is 0. These are high values. 
Do you mean 56, 24, and 0 are high values?  Or the raw values are high?
Is the raw value wrapping around?
> The firmware is doing its job in that it's fixing errors. But the fact it has to at this rate is not a good sign. Post sda's full attribute list.
--------------------------------------------------------------------------------------------------------------------------------------------------
# date; smartctl -a /dev/sda
Tue Jan  8 01:20:54 PST 2013
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3750330NS
Serial Number:    9QK1MBCW
Firmware Version: SN05
User Capacity:    750,156,374,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Jan  8 01:20:54 2013 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
# the next item seems peculiar.
# It sounds as if the test was aborted, yet the rest result above is passed.
Self-test execution status:      (  41) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                 ( 642) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 177) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   075   063   044    Pre-fail  Always       -       38010669
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       101
  5 Reallocated_Sector_Ct   0x0033   099   099   036    Pre-fail  Always       -       31
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       65563711282
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34776
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       102
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   088   000    Old_age   Always       -       335
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   057   045    Old_age   Always       -       34 (Lifetime Min/Max 34/36)
194 Temperature_Celsius     0x0022   034   043   000    Old_age   Always       -       34 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   044   024   000    Old_age   Always       -       38010669
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Interrupted (host reset)      00%     34773         -
# 2  Extended offline    Completed without error       00%     32464         -
# 3  Extended offline    Completed without error       00%     21385         -
# 4  Short offline       Completed without error       00%     20993         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
------------------------------------------------------------
> 
> Is this disk under warranty? If it is I'd just get rid of it.
I think it's over 3 years old, so probably not in warranty; it might be
if it's a 5 year warranty.

Fortunately, I've already got new disks in the machine.  The transition
has proved challenging.

I was more or less ready to go, but I wanted to do some experiments with
the alignment of partitions and other parameters.  Any suggestions would
be great.

Ross


P.S. Here are the results for sdb, which has also been generating
chatter in the logs.
--------------------------------------------------------------------------------
on Jan  7 22:51:59 PST 2013
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2003FYYS-02W0B1
Serial Number:    WD-WCAY00580447
Firmware Version: 01.01D02
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Jan  7 22:51:59 2013 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (29760) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       7725
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1676
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       13
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       5
194 Temperature_Celsius     0x0022   107   102   000    Old_age   Always       -       45
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1676         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
------------------------------------------------------------------------------------------------------

> 
> 
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  9:32           ` Ross Boylan
@ 2013-01-08 17:36             ` Chris Murphy
  2013-01-08 22:30             ` Stan Hoeppner
  1 sibling, 0 replies; 21+ messages in thread
From: Chris Murphy @ 2013-01-08 17:36 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid@vger.kernel.org Raid

On Jan 8, 2013, at 2:32 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote:

> On Tue, 2013-01-08 at 01:48 -0700, Chris Murphy wrote:
>>> 
>> 
>> Not good. The current value is 56, the worst is 24, and the threshold is 0. These are high values. 
> Do you mean 56, 24, and 0 are high values?  Or the raw values are high?

0 is the point at which the drive will change its health from passing to failing. It's gotten as low as 24. So I'd say it's pre-failing, it just isn't telling you that literally. As raw values go up, the current value goes down. The closer current and threshold are, the worse the health of the drive for that particular attribute. It's actually a bit more complicated than that, there's lots of discussion of this on the smatmontools site.

> Is the raw value wrapping around?

No idea.

> # date; smartctl -a /dev/sda
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   075   063   044    Pre-fail  Always       -       38010669
>  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
>  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       101
>  5 Reallocated_Sector_Ct   0x0033   099   099   036    Pre-fail  Always       -       31
>  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       65563711282
>  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34776
> 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
> 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       102
> 184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
> 188 Unknown_Attribute       0x0032   100   088   000    Old_age   Always       -       335
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   066   057   045    Old_age   Always       -       34 (Lifetime Min/Max 34/36)
> 194 Temperature_Celsius     0x0022   034   043   000    Old_age   Always       -       34 (0 18 0 0)
> 195 Hardware_ECC_Recovered  0x001a   044   024   000    Old_age   Always       -       38010669
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

So there have been reallocated sectors, so some of them are bad. And since they tend to be located in groups, it probably explains why you had a slow initial rebuild that then sped up. Again, if it's under warranty, get rid of it. If it's not, well, I'd probably still get rid of it or use it for something inconsequential, after using hdparm to secure erase it (or use dd to write zeros, which is OK for HDDs, not OK for SSDs).

> Fortunately, I've already got new disks in the machine.  The transition
> has proved challenging.
> 
> I was more or less ready to go, but I wanted to do some experiments with
> the alignment of partitions and other parameters.  Any suggestions would
> be great.

You must've missed the other email I sent about alignment. The reds are not aligned. And you're using completely whacky partition sizes between sda and sd[bc] for reasons I don't understand.

http://www.spinics.net/lists/raid/msg41506.html

> P.S. Here are the results for sdb, which has also been generating
> chatter in the logs.

What do you mean by chatter in the logs? I don't see anything wrong here, but as something like 35% of drive failures occur without SMART ever indicating a single problem, who knows.

Chris Murphy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  9:32           ` Ross Boylan
  2013-01-08 17:36             ` Chris Murphy
@ 2013-01-08 22:30             ` Stan Hoeppner
  1 sibling, 0 replies; 21+ messages in thread
From: Stan Hoeppner @ 2013-01-08 22:30 UTC (permalink / raw)
  To: Ross Boylan; +Cc: Chris Murphy, linux-raid@vger.kernel.org Raid

On 1/8/2013 3:32 AM, Ross Boylan wrote:

> # date; smartctl -a /dev/sda
> Device Model:     ST3750330NS
> Serial Number:    9QK1MBCW

> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   075   063   044    Pre-fail  Always       -       38010669
>   7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       65563711282
> 195 Hardware_ECC_Recovered  0x001a   044   024   000    Old_age   Always       -       38010669

This 750GB Seagate drive is FUBAR.  Replace it as soon as possible.


> Device Model:     WDC WD2003FYYS-02W0B1
> Serial Number:    WD-WCAY00580447

> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

This Western Digital appears to be fine.  Please show log entries that
lead you to believe it has problems.

-- 
Stan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  7:17     ` Chris Murphy
  2013-01-08  7:49       ` Ross Boylan
@ 2013-01-08  7:59       ` Ross Boylan
  2013-01-08  9:10         ` Chris Murphy
  1 sibling, 1 reply; 21+ messages in thread
From: Ross Boylan @ 2013-01-08  7:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid

On Tue, 2013-01-08 at 00:17 -0700, Chris Murphy wrote:
> On Jan 7, 2013, at 11:59 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> >> 
> > Isn't it possible there's a hardware problem, e.g., leading to a
> > failure/retry cycle?
> 
> smartctl -a /dev/sda
> smartctl -a /dev/sdb
> smartctl -a /dev/sdc
> 
> Compare them. If there was a write failure reported by the drive, md would have marked the device faulty.
> 
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
In response to your other query about the locations of the partitions:
# parted /dev/sda unit s p select /dev/sdb p select /dev/sdc p
Model: ATA ST3750330NS (scsi)
Disk /dev/sda: 1465149168s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start     End          Size         Type     File system  Flags
 1      63s       192779s      192717s      primary  ext3         boot, raid
 2      192780s   4096574s     3903795s     primary
 3      4096575s  1465144064s  1461047490s  primary               raid

Using /dev/sdb
Model: ATA WDC WD2003FYYS-0 (scsi)
Disk /dev/sdb: 3907029168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start     End          Size         File system  Name                   Flags
 1      34s       999999s      999966s                   extended boot loaders
 2      1000000s  2929687s     1929688s     ext3         /boot                  boot
 3      2929688s  6835937s     3906250s                  swap
 4      6835938s  3907029134s  3900193197s               main

Using /dev/sdc
Model: ATA WDC WD2003FYYS-0 (scsi)
Disk /dev/sdc: 3907029168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start     End          Size         File system  Name                   Flags
 1      34s       999999s      999966s                   extended boot loaders
 2      1000000s  2929687s     1929688s     ext3         boot                   boot
 3      2929688s  6835937s     3906250s                  swap
 4      6835938s  3907029134s  3900193197s               main

BTW the spec sheet for the WDC "red" drives says they use advanced
formatting (I may not have the buzzword quite right) with physical
sectors of 4k.  So the reported sector size is a fib.

Ross

P.S. I didn't explicitly respond to Stan's comment that I might have
tweaked my sync speed down.  I haven't deliberately, though I suppose
something could have done it behind my back.  The increased performance
as time passed does suggest something else was loading the io system,
though it seems odd that happened without noticeable cpu use.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  7:59       ` Ross Boylan
@ 2013-01-08  9:10         ` Chris Murphy
  2013-01-08 21:54           ` Ross Boylan
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Murphy @ 2013-01-08  9:10 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org Raid

On Jan 8, 2013, at 12:59 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> Using /dev/sdb
> Model: ATA WDC WD2003FYYS-0 (scsi)
> Disk /dev/sdb: 3907029168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start     End          Size         File system  Name                   Flags
> 1      34s       999999s      999966s                   extended boot loaders
> 2      1000000s  2929687s     1929688s     ext3         /boot                  boot
> 3      2929688s  6835937s     3906250s                  swap
> 4      6835938s  3907029134s  3900193197s               main
> 
> Using /dev/sdc
> Model: ATA WDC WD2003FYYS-0 (scsi)
> Disk /dev/sdc: 3907029168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start     End          Size         File system  Name                   Flags
> 1      34s       999999s      999966s                   extended boot loaders
> 2      1000000s  2929687s     1929688s     ext3         boot                   boot
> 3      2929688s  6835937s     3906250s                  swap
> 4      6835938s  3907029134s  3900193197s               main
> 
> BTW the spec sheet for the WDC "red" drives says they use advanced
> formatting (I may not have the buzzword quite right) with physical
> sectors of 4k.  So the reported sector size is a fib.

Yeah you're using an old version of parted for it to not recognize that the physical sectors are 4096 bytes. The thing is, that it's a 512e disk, so the LBA's are still 512 bytes. And by the looks of it, your partitions are not aligned on those 4K physical sectors because the start value is 34s. In any recent fdisk or parted or gdisk, the start sector is 2048 (1MiB), and each partition is aligned on 8-sector boundaries. So your disks aren't properly partitioned, and you're getting a performance hit because of it.

What I'm not getting is why your md0, comprised of sda1 at 192717s, and sd[bc]2 are 1929688s. What am I missing here? Because those values aren't at all the same. It's a 10x difference.

And then with md1, comprised of sda3 at 1461047490s, and sd[bc]4 are 3900193197s. A 2.66x difference. What is this? sda1 is 696GiB, while sd[bc]4 are 1.8TiB each? Ummm…

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  9:10         ` Chris Murphy
@ 2013-01-08 21:54           ` Ross Boylan
  2013-01-08 22:38             ` Chris Murphy
  2013-01-08 23:03             ` Stan Hoeppner
  0 siblings, 2 replies; 21+ messages in thread
From: Ross Boylan @ 2013-01-08 21:54 UTC (permalink / raw)
  To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid

On Tue, 2013-01-08 at 02:10 -0700, Chris Murphy wrote:
> On Jan 8, 2013, at 12:59 AM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > 
> > Using /dev/sdb
> > Model: ATA WDC WD2003FYYS-0 (scsi)
> > Disk /dev/sdb: 3907029168s
> > Sector size (logical/physical): 512B/512B
> > Partition Table: gpt
> > 
> > Number  Start     End          Size         File system  Name                   Flags
> > 1      34s       999999s      999966s                   extended boot loaders
> > 2      1000000s  2929687s     1929688s     ext3         /boot                  boot
> > 3      2929688s  6835937s     3906250s                  swap
> > 4      6835938s  3907029134s  3900193197s               main
> > 
> > Using /dev/sdc
> > Model: ATA WDC WD2003FYYS-0 (scsi)
> > Disk /dev/sdc: 3907029168s
> > Sector size (logical/physical): 512B/512B
> > Partition Table: gpt
> > 
> > Number  Start     End          Size         File system  Name                   Flags
> > 1      34s       999999s      999966s                   extended boot loaders
> > 2      1000000s  2929687s     1929688s     ext3         boot                   boot
> > 3      2929688s  6835937s     3906250s                  swap
> > 4      6835938s  3907029134s  3900193197s               main
> > 
> > BTW the spec sheet for the WDC "red" drives says they use advanced
> > formatting (I may not have the buzzword quite right) with physical
> > sectors of 4k.  So the reported sector size is a fib.
> 
> Yeah you're using an old version of parted for it to not recognize that the physical sectors are 4096 bytes. The thing is, that it's a 512e disk, so the LBA's are still 512 bytes. And by the looks of it, your partitions are not aligned on those 4K physical sectors because the start value is 34s. In any recent fdisk or parted or gdisk, the start sector is 2048 (1MiB), and each partition is aligned on 8-sector boundaries. So your disks aren't properly partitioned, and you're getting a performance hit because of it.
> 
> What I'm not getting is why your md0, comprised of sda1 at 192717s, and sd[bc]2 are 1929688s. What am I missing here? Because those values aren't at all the same. It's a 10x difference.
I'm migrating the array from an old, smaller disk (it was a pair of
disks, but I've already pulled one) to newer larger disks.  Eventually
the current sda will go away (I was going to keep using it, but given
recent problems, as you suggest, best to ditch it) and the RAID arrays
willl grow to fill the new space.

I manually specified the current layout of the bigger disks (sdb and c);
at least some of the time I specified the exact sector. I picked 34
because that seems to be the traditional offset for the first partition
(and the one my tool generated when I gave it sizes in grosser units
than sectors or told it to start at 0).

Apparently some disks do a logical to physical remap that includes an
offset as well as a change in the sector size.  Should I check for that,
or should I just assume that I should start my partitions on sectors
that are multiples of 8?

You also asked what I meant by chatter in the logs about sdb.  Here are
some entries from shortly before the system locked up:
Jan  6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 65
Jan  6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35
Jan  6 03:45:25 markov smartd[5368]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 109

I am less excited about that since discovering the message about sdb
does not mean it's running at over 100 degrees celsius (the raw value is
around 45).

The logs from the restart show
Jan  7 17:19:09 markov kernel: [    2.928055] ata2.00: SATA link down (SStatus 0 SControl 0)
Jan  7 17:19:09 markov kernel: [    2.928102] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan  7 17:19:09 markov kernel: [    2.944459] ata2.01: ATA-8: WDC WD2003FYYS-02W0B1, 01.01D02, max UDMA/133
Jan  7 17:19:09 markov kernel: [    2.944498] ata2.01: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32)
Jan  7 17:19:09 markov kernel: [    2.952486] ata2.01: configured for UDMA/133
Jan  7 17:19:09 markov kernel: [    2.952642] scsi 1:0:1:0: Direct-Access     ATA      WDC WD2003FYYS-0 01.0 PQ: 0 ANSI: 5
Jan  7 17:19:09 markov kernel: [    2.952918] scsi 2:0:0:0: Direct-Access     ATA      WDC WD2003FYYS-0 01.0 PQ: 0 ANSI: 5
Jan  7 17:19:09 markov kernel: [    2.953695] scsi 3:0:0:0: CD-ROM            TSSTcorp CDDVDW SH-S223B  SB00 PQ: 0 ANSI: 5

Jan  7 17:19:09 markov kernel: [    3.289403] md: md0 stopped.
Jan  7 17:19:09 markov kernel: [    3.328423] md: md1 stopped.
Jan  7 17:19:09 markov kernel: [    3.382868] md: bind<sdb4>
Jan  7 17:19:09 markov kernel: [    3.383054] md: bind<sdc4>
Jan  7 17:19:09 markov kernel: [    3.383347] md: bind<sda3>
Jan  7 17:19:09 markov kernel: [    3.390925] raid1: md1 is not clean -- starting background reconstruction
Jan  7 17:19:09 markov kernel: [    3.390963] raid1: raid set md1 active with 3 out of 3 mirrors
Jan  7 17:19:09 markov kernel: [    3.391016] md1: detected capacity change from 0 to 748056215552
Jan  7 17:19:09 markov kernel: [    3.391169]  md1: unknown partition table

Jan  7 17:19:09 markov kernel: [    2.220056] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan  7 17:19:09 markov kernel: [    2.220103] ata1.01: SATA link down (SStatus 0 SControl 310)
Jan  7 17:19:09 markov kernel: [    2.228670] ata1.00: ATA-8: ST3750330NS, SN05, max UDMA/133
Jan  7 17:19:09 markov kernel: [    2.228709] ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
Jan  7 17:19:09 markov kernel: [    2.244690] ata1.00: configured for UDMA/133
Jan  7 17:19:09 markov kernel: [    2.244845] scsi 0:0:0:0: Direct-Access     ATA      ST3750330NS      SN05 PQ: 0 ANSI: 5

Aside from the message that md1 isn't clean, the SATA link down messages
sound a little odd.  I'm not sure how to map from atax to disk, but ata2
seems to be one of the new disks (sdb or sdc) and ata1 is the old one
(sda).

/dev/disk/by-path shows
  lrwxrwxrwx 1 root root   9 2013-01-07 17:15 pci-0000:00:1f.2-scsi-0:0:0:0 -> ../../sda
  lrwxrwxrwx 1 root root   9 2013-01-07 17:15 pci-0000:00:1f.2-scsi-1:0:1:0 -> ../../sdb
  lrwxrwxrwx 1 root root   9 2013-01-07 17:15 pci-0000:00:1f.5-scsi-0:0:0:0 -> ../../sdc

Ross

> 
> And then with md1, comprised of sda3 at 1461047490s, and sd[bc]4 are 3900193197s. A 2.66x difference. What is this? sda1 is 696GiB, while sd[bc]4 are 1.8TiB each? Ummm…
> 
> 
> 
> 
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08 21:54           ` Ross Boylan
@ 2013-01-08 22:38             ` Chris Murphy
  2013-01-08 23:13               ` Ross Boylan
  2013-01-08 23:03             ` Stan Hoeppner
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Murphy @ 2013-01-08 22:38 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org Raid

On Jan 8, 2013, at 2:54 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> I manually specified the current layout of the bigger disks (sdb and c);
> at least some of the time I specified the exact sector. I picked 34
> because that seems to be the traditional offset for the first partition
> (and the one my tool generated when I gave it sizes in grosser units
> than sectors or told it to start at 0).

Today 34 is both old and incorrect, so you need to redo the layout.

> Apparently some disks do a logical to physical remap that includes an
> offset as well as a change in the sector size.  Should I check for that,
> or should I just assume that I should start my partitions on sectors
> that are multiples of 8?

I know of no disks that change the sector size. It's always 512 logical, 4096 physical for reds. There are supposed to be native 4Kn drives between now and soon, but they aren't switchable between 512e and 4Kn. As for the offset, that still won't work because it'll change the position of your partition map so you'd have to start over anyway, even if it were available, which I don't think it is on a red.

So you just need to use a more recent partition tool and repartition the disks correctly.

> 
> You also asked what I meant by chatter in the logs about sdb.  Here are
> some entries from shortly before the system locked up:
> Jan  6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 65
> Jan  6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35
> Jan  6 03:45:25 markov smartd[5368]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 109

smartmontools 5.38 is old, and this red drive isn't in its database, so the data may be interpreted incorrectly. 108 C is very hot. But I wouldn't totally discount it when the drives are all busy on a resync, if you get wildly different Raw_Values for this attribute between sdb and sdc since they're the same drive model.

Chris Murphy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08 22:38             ` Chris Murphy
@ 2013-01-08 23:13               ` Ross Boylan
  2013-01-09  0:43                 ` Chris Murphy
  0 siblings, 1 reply; 21+ messages in thread
From: Ross Boylan @ 2013-01-08 23:13 UTC (permalink / raw)
  To: Chris Murphy; +Cc: ross, linux-raid@vger.kernel.org Raid

On Tue, 2013-01-08 at 15:38 -0700, Chris Murphy wrote:
> On Jan 8, 2013, at 2:54 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > 
> > I manually specified the current layout of the bigger disks (sdb and c);
> > at least some of the time I specified the exact sector. I picked 34
> > because that seems to be the traditional offset for the first partition
> > (and the one my tool generated when I gave it sizes in grosser units
> > than sectors or told it to start at 0).
> 
> Today 34 is both old and incorrect, so you need to redo the layout.
> 
> > Apparently some disks do a logical to physical remap that includes an
> > offset as well as a change in the sector size.  Should I check for that,
> > or should I just assume that I should start my partitions on sectors
> > that are multiples of 8?
> 
> I know of no disks that change the sector size. It's always 512 logical, 4096 physical for reds. There are supposed to be native 4Kn drives between now and soon, but they aren't switchable between 512e and 4Kn. As for the offset, that still won't work because it'll change the position of your partition map so you'd have to start over anyway, even if it were available, which I don't think it is on a red.
> 
I didn't mean that the disk changed its sector size dynamically, just
that, e.g., it might have physical sectors of 4k but report that it has
(logical) sectors of 512.

I'm not sure what you mean by the offset working.  I'm referring to the
fact that for some drives when you ask for logical sector n you actually
get physical sector n+1, n-2, or something like that.  This implies that
aligning on the logical sectors (meaning the ones the drive reports out)
might misalign on the physical ones.
> So you just need to use a more recent partition tool and repartition the disks correctly.
Correctly = start at multiples of 8?
> 
> 
> 
> 
> 
> > 
> > You also asked what I meant by chatter in the logs about sdb.  Here are
> > some entries from shortly before the system locked up:
> > Jan  6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 65
> > Jan  6 03:45:24 markov smartd[5368]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35
> > Jan  6 03:45:25 markov smartd[5368]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 109
> 
> smartmontools 5.38 is old, and this red drive isn't in its database, so the data may be interpreted incorrectly. 108 C is very hot. But I wouldn't totally discount it when the drives are all busy on a resync, if you get wildly different Raw_Values for this attribute between sdb and sdc since they're the same drive model.
> 
That report was from before the system crash, when it was probably doing
very little, although disk intensive maintenance such as backups or
indexing the mail spool might have been happening.

I thought 108 was the scaled smart score, which is between 0 and 255
with higher being better.  The raw value of 45 seemed more plausible as
an actual temperature, though I guess there's no guarantee of that.

sdb and sdc have similar numbers for Temperature_Celsius.

On the logs and sign of disk failure, it's quite possible I don't know
what I'm looking for.  Given their size and the fact that the drive
failure seems clear, I think I'll  spare you all the gory details.

Ross



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08 23:13               ` Ross Boylan
@ 2013-01-09  0:43                 ` Chris Murphy
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Murphy @ 2013-01-09  0:43 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org Raid

On Jan 8, 2013, at 4:13 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:

> I didn't mean that the disk changed its sector size dynamically, just
> that, e.g., it might have physical sectors of 4k but report that it has
> (logical) sectors of 512.

Reds are AF drives. Any 512e AF drive should be reported as having 512 bytes logical sector size, and 4096 byte physical sector size.

> I'm not sure what you mean by the offset working.  I'm referring to the
> fact that for some drives when you ask for logical sector n you actually
> get physical sector n+1, n-2, or something like that.  This implies that
> aligning on the logical sectors (meaning the ones the drive reports out)
> might misalign on the physical ones.

There are some drives floating around that have a jumper switch, targeted at Windows XP and older, that will do an offset like what you describe. The jumper isn't enabled by default, and you don't want to use it.

>> So you just need to use a more recent partition tool and repartition the disks correctly.
> Correctly = start at multiples of 8?

Don't think of it that way. You can come up with a partition sector start value divisible by 8 that is right in the middle of physical sector, which is what you don't want.

Recent partitioning tools (i.e in the last 3 years at least), do the right thing if you don't 2nd guess them. First partition starts at 2048. Specify partition sizes in MiB. Now you won't have a problem.

> 
> I thought 108 was the scaled smart score, which is between 0 and 255
> with higher being better.  The raw value of 45 seemed more plausible as
> an actual temperature, though I guess there's no guarantee of that.

Yes.

> 
> sdb and sdc have similar numbers for Temperature_Celsius.
> 
> On the logs and sign of disk failure, it's quite possible I don't know
> what I'm looking for.  Given their size and the fact that the drive
> failure seems clear, I think I'll  spare you all the gory details.

I think you just have the one disk that's giving you trouble.

Chris Murphy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08 21:54           ` Ross Boylan
  2013-01-08 22:38             ` Chris Murphy
@ 2013-01-08 23:03             ` Stan Hoeppner
  1 sibling, 0 replies; 21+ messages in thread
From: Stan Hoeppner @ 2013-01-08 23:03 UTC (permalink / raw)
  To: Ross Boylan; +Cc: Chris Murphy, linux-raid@vger.kernel.org Raid

On 1/8/2013 3:54 PM, Ross Boylan wrote:

> I am less excited about that since discovering the message about sdb
> does not mean it's running at over 100 degrees celsius (the raw value is
> around 45).

You must ignore the VALUE and WORST columns for drive temp.  These are
"normalized" values only the smartmon idiots understand.  The actual
temp of 45C is a bit high, but well within the operating range for that
drive.  The WDC drives have a max temp (failure) of 80C IIRC, and a
normal max operating temp of 65C.  So you don't need to worry about this
drive's temp.

> The logs from the restart show
> Jan  7 17:19:09 markov kernel: [    2.928055] ata2.00: SATA link down (SStatus 0 SControl 0)
> Jan  7 17:19:09 markov kernel: [    2.928102] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Jan  7 17:19:09 markov kernel: [    2.944459] ata2.01: ATA-8: WDC WD2003FYYS-02W0B1, 01.01D02, max UDMA/133

> Jan  7 17:19:09 markov kernel: [    2.220056] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Jan  7 17:19:09 markov kernel: [    2.220103] ata1.01: SATA link down (SStatus 0 SControl 310)
> Jan  7 17:19:09 markov kernel: [    2.228670] ata1.00: ATA-8: ST3750330NS, SN05, max UDMA/133

> the SATA link down messages
> sound a little odd.

No mystery here.  These ports (links) are down because no drives are
connected to them, apparently.  Show full dmesg output, and tell us the
SAS/SATA controller and port count on each for the system in question.

-- 
Stan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  2:05 How do I tell which disk failed? Ross Boylan
  2013-01-08  5:19 ` Stan Hoeppner
@ 2013-01-08  5:55 ` Chris Murphy
  2013-01-08  9:55 ` Mikael Abrahamsson
  2 siblings, 0 replies; 21+ messages in thread
From: Chris Murphy @ 2013-01-08  5:55 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org Raid


On Jan 7, 2013, at 7:05 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> Personalities : [raid1]
> md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
> 
> 
> md1 : active raid1 sda3[0] sdc4[2] sdb4[1]

fdisk -l ?

Where are sda2, sdc1, sdc3, sdb1, sdb3?

Chris Murphy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  2:05 How do I tell which disk failed? Ross Boylan
  2013-01-08  5:19 ` Stan Hoeppner
  2013-01-08  5:55 ` Chris Murphy
@ 2013-01-08  9:55 ` Mikael Abrahamsson
  2013-01-08 17:20   ` Ross Boylan
  2 siblings, 1 reply; 21+ messages in thread
From: Mikael Abrahamsson @ 2013-01-08  9:55 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

On Mon, 7 Jan 2013, Ross Boylan wrote:

> I see my array is reconstructing, but I can't tell which disk failed.
> Is there a way to?  I tried mdadm --detail on the array, mdadm --examine

You should look into the kernel logs, "dmesg" might tell you if there 
hasn't been much other log activity lately, otherwise you have to check 
the logfiles. It will log any fail events there.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08  9:55 ` Mikael Abrahamsson
@ 2013-01-08 17:20   ` Ross Boylan
  2013-01-08 21:24     ` pg_mh, Peter Grandi
  2013-01-08 22:34     ` Stan Hoeppner
  0 siblings, 2 replies; 21+ messages in thread
From: Ross Boylan @ 2013-01-08 17:20 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: ross, linux-raid

On Tue, 2013-01-08 at 10:55 +0100, Mikael Abrahamsson wrote:
> On Mon, 7 Jan 2013, Ross Boylan wrote:
> 
> > I see my array is reconstructing, but I can't tell which disk failed.
> > Is there a way to?  I tried mdadm --detail on the array, mdadm --examine
> 
> You should look into the kernel logs, "dmesg" might tell you if there 
> hasn't been much other log activity lately, otherwise you have to check 
> the logfiles. It will log any fail events there.

I checked the logs and didn't see anything about a drive failing, though
there were some smartd reports of changes in drive parameters like
temperature.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08 17:20   ` Ross Boylan
@ 2013-01-08 21:24     ` pg_mh, Peter Grandi
  2013-01-08 22:34     ` Stan Hoeppner
  1 sibling, 0 replies; 21+ messages in thread
From: pg_mh, Peter Grandi @ 2013-01-08 21:24 UTC (permalink / raw)
  To: Linux RAID

[ ... ]

>>> Personalities : [raid1]
>>> md0 : active raid1 sda1[0] sdc2[2] sdb2[1]
>>>      96256 blocks [3/3] [UUU]
>>>
>>> md1 : active raid1 sda3[0] sdc4[2] sdb4[1]
>>>      730523648 blocks [3/3] [UUU]
>>>      [>....................]  resync =  0.4% (3382400/730523648) finish=14164.9min speed=855K/sec

>>> I see my array is reconstructing, but I can't tell which
>>> disk failed. [ ... ] The system is currently sluggish and
>>> the load is 13 [ ... ]

If your kernel is one that puts IO wait in the load average
that's expected if there is heavy IO load that makes resync
slow.

>> A more recent check show speed continuing to rise; [ ... ]

Perhaps because the 'fsck' ended, as the speed issue is likely
to have been been a long 'fsck', consequent to an abrupt
shutdown:

>>  [ ... ] The resulting shutdown (which was a manual power
>> off) leaves the arrays and their components in a funky state.
>> When the system comes back, it fixes things up. [ ... ]

Plus the poor alignment of the 'sda' partitions cutting write
rates very significantly. Your 'sd[bc]' disks instead are GPT
partitioned and that is by default 1MiB aligned, but you
probably used some very old tool and 'sd[bc]4' are 1KiB aligned:

  $ factor 6835938
  6835938: 2 3 17 29 2311

Someone else has pointed out the large difference in partition
sizes among 'sda' vs. 'sd[bc]'; while that does not cause speed
issue, the RAID set will just reduce to the multiple of the
smallest size. Indeed it is reported as 730m blocks, which is
the equivalent of  1461047490s reported by 'fdisk' for 'sda3'.

Probably you should have a 2-disk RAID1 of 'sd[bc]' alone.

>> Even if this did happen, in RAID 1 wouldn't some of the
>> componnents (partitions in my case) be deemed good and others
>> bad, with the latter resynced to match the former?  And if
>> that is happening, why can't I tell which partition(s) are
>> master (considered good) and which are not

Because you haven't read some relevant documentation...

>> (being overwritten with contents of the master)?

Two ways, for example:

  * The "event counts" reported by will be different (higher
    event count means more recent).

  * 'iostat' will tell you which drives are being read and which
    written.

> I checked the logs and didn't see anything about a drive
> failing, though there were some smartd reports of changes in
> drive parameters like temperature.

The kernel logs always tell if a resync is triggered by a
failure, but note that a resync happens on a failure when a
spare is added to the RAID set to replace the failed drive, or
when the drives are out of sync because of an abrupt shutdown,
which seems to be your case.

Anyhow the ways to look at the health of the disk suggested by
others are somewhat misleading. The first thing is to have a
mental model of possible disk failure modes... Anyhow, the most
relevant data are in 'smartctl -A' the number of reallocated
sectors (too many indicates a failing disk) and the SMART
selftest and error logs, to check the frequency of issues.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: How do I tell which disk failed?
  2013-01-08 17:20   ` Ross Boylan
  2013-01-08 21:24     ` pg_mh, Peter Grandi
@ 2013-01-08 22:34     ` Stan Hoeppner
  1 sibling, 0 replies; 21+ messages in thread
From: Stan Hoeppner @ 2013-01-08 22:34 UTC (permalink / raw)
  To: Ross Boylan; +Cc: Mikael Abrahamsson, linux-raid

On 1/8/2013 11:20 AM, Ross Boylan wrote:

> I checked the logs and didn't see anything about a drive failing

I'd guess you don't know what you're looking for.  If you post dmesg
output to the list we'll find it.  Although, given the S.M.A.R.T data
for the Seagate drive, it's probably unnecessary, as we know it's FUBAR.

-- 
Stan

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-01-09  0:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-08  2:05 How do I tell which disk failed? Ross Boylan
2013-01-08  5:19 ` Stan Hoeppner
2013-01-08  6:59   ` Ross Boylan
2013-01-08  7:17     ` Chris Murphy
2013-01-08  7:49       ` Ross Boylan
2013-01-08  8:48         ` Chris Murphy
2013-01-08  9:32           ` Ross Boylan
2013-01-08 17:36             ` Chris Murphy
2013-01-08 22:30             ` Stan Hoeppner
2013-01-08  7:59       ` Ross Boylan
2013-01-08  9:10         ` Chris Murphy
2013-01-08 21:54           ` Ross Boylan
2013-01-08 22:38             ` Chris Murphy
2013-01-08 23:13               ` Ross Boylan
2013-01-09  0:43                 ` Chris Murphy
2013-01-08 23:03             ` Stan Hoeppner
2013-01-08  5:55 ` Chris Murphy
2013-01-08  9:55 ` Mikael Abrahamsson
2013-01-08 17:20   ` Ross Boylan
2013-01-08 21:24     ` pg_mh, Peter Grandi
2013-01-08 22:34     ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).