* Request for assistance
@ 2016-07-06 0:13 o1bigtenor
2016-07-06 1:55 ` Adam Goryachev
2016-07-06 7:39 ` keld
0 siblings, 2 replies; 10+ messages in thread
From: o1bigtenor @ 2016-07-06 0:13 UTC (permalink / raw)
To: Linux-RAID
Greetings
Running a Raid 10 array with 4 - 3 TB drives. Have a UPS but this area
gets significant lightning and also brownout (rural power) events.
Found the array was read only this morning. Thought that rebooting the
system might correct things - - - it did not as the array did not
load.
commands used followed by system response
mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
cat /proc/mdstat
md0 : inactive sdc1[5](S) sdf1[8](S) sde1[7](S) sdb1[4](S)
mdadm -E /dev/sdb1
sdc1
sde1
sdf1
everything is the same except for 2 items
sde and sdf have uptime listed from July 04 05:50:46
events 64841
array state of AAAA
sdb and sdc have uptime listed from July 05 01:57:38
events 64844
array state of AAA.
Do I just re-create the array?
TIA
Dee
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 0:13 Request for assistance o1bigtenor
@ 2016-07-06 1:55 ` Adam Goryachev
2016-07-06 12:14 ` o1bigtenor
2016-07-06 7:39 ` keld
1 sibling, 1 reply; 10+ messages in thread
From: Adam Goryachev @ 2016-07-06 1:55 UTC (permalink / raw)
To: o1bigtenor, Linux-RAID
On 06/07/16 10:13, o1bigtenor wrote:
> Greetings
>
> Running a Raid 10 array with 4 - 3 TB drives. Have a UPS but this area
> gets significant lightning and also brownout (rural power) events.
>
> Found the array was read only this morning. Thought that rebooting the
> system might correct things - - - it did not as the array did not
> load.
>
> commands used followed by system response
>
> mdadm --detail /dev/md0
> mdadm: md device /dev/md0 does not appear to be active.
>
> cat /proc/mdstat
> md0 : inactive sdc1[5](S) sdf1[8](S) sde1[7](S) sdb1[4](S)
>
> mdadm -E /dev/sdb1
> sdc1
> sde1
> sdf1
>
> everything is the same except for 2 items
>
> sde and sdf have uptime listed from July 04 05:50:46
> events 64841
> array state of AAAA
>
> sdb and sdc have uptime listed from July 05 01:57:38
> events 64844
> array state of AAA.
>
>
>
> Do I just re-create the array?
>
No, not if you value your data. Only re-create the array if you are told
to by someone (knowledgeable) on the list.
In your case, I think you should stop the array.
mdadm --stop /dev/md0
Make sure there is nothing listed in /proc/mdstat
Then try to assemble the array, but force the events to match:
mdadm --assemble /dev/md0 --force /dev/sd[bcef]1
If that doesn't work, then include the output from dmesg as well as
/proc/mdstat and any commandline output generated.
You might also want to examine why two drives dropped, referring to logs
or similar might assist.
Regards,
Adam
--
Adam Goryachev Website Managers www.websitemanagers.com.au
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 0:13 Request for assistance o1bigtenor
2016-07-06 1:55 ` Adam Goryachev
@ 2016-07-06 7:39 ` keld
2016-07-06 12:15 ` o1bigtenor
1 sibling, 1 reply; 10+ messages in thread
From: keld @ 2016-07-06 7:39 UTC (permalink / raw)
To: o1bigtenor; +Cc: Linux-RAID
What operating system and version are you running?
Best regards
keld
On Tue, Jul 05, 2016 at 07:13:23PM -0500, o1bigtenor wrote:
> Greetings
>
> Running a Raid 10 array with 4 - 3 TB drives. Have a UPS but this area
> gets significant lightning and also brownout (rural power) events.
>
> Found the array was read only this morning. Thought that rebooting the
> system might correct things - - - it did not as the array did not
> load.
>
> commands used followed by system response
>
> mdadm --detail /dev/md0
> mdadm: md device /dev/md0 does not appear to be active.
>
> cat /proc/mdstat
> md0 : inactive sdc1[5](S) sdf1[8](S) sde1[7](S) sdb1[4](S)
>
> mdadm -E /dev/sdb1
> sdc1
> sde1
> sdf1
>
> everything is the same except for 2 items
>
> sde and sdf have uptime listed from July 04 05:50:46
> events 64841
> array state of AAAA
>
> sdb and sdc have uptime listed from July 05 01:57:38
> events 64844
> array state of AAA.
>
>
>
> Do I just re-create the array?
>
> TIA
>
> Dee
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 1:55 ` Adam Goryachev
@ 2016-07-06 12:14 ` o1bigtenor
2016-07-06 12:51 ` Wols Lists
0 siblings, 1 reply; 10+ messages in thread
From: o1bigtenor @ 2016-07-06 12:14 UTC (permalink / raw)
To: Adam Goryachev; +Cc: Linux-RAID
On Tue, Jul 5, 2016 at 8:55 PM, Adam Goryachev
<mailinglists@websitemanagers.com.au> wrote:
> On 06/07/16 10:13, o1bigtenor wrote:
>>
>> Greetings
>>
>> Running a Raid 10 array with 4 - 3 TB drives. Have a UPS but this area
>> gets significant lightning and also brownout (rural power) events.
>>
snip
>>
>> Do I just re-create the array?
>>
> No, not if you value your data. Only re-create the array if you are told to
> by someone (knowledgeable) on the list.
>
> In your case, I think you should stop the array.
> mdadm --stop /dev/md0
> Make sure there is nothing listed in /proc/mdstat
> Then try to assemble the array, but force the events to match:
> mdadm --assemble /dev/md0 --force /dev/sd[bcef]1
>
> If that doesn't work, then include the output from dmesg as well as
> /proc/mdstat and any commandline output generated.
>
> You might also want to examine why two drives dropped, referring to logs or
> similar might assist.
>
mdadm --stop /dev/md0
cat /proc/mdstat
indicated no md (can't remember the exact response but it said
nothing there)
mdadm --assemble /dev/md0 --force /dev/sd[bcef]1 to
mdadm :forcing event count in /dev/sde1(2) from 64841 to 64844
mdadm :forcing event count in /dev/sdf1(3) from 64841 to 64844
mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdf1
mdadm: Marking array /dev/md0 as 'clean'
mdadm: /dev/md0 has been started with 4 drives
So my array is back up - - - thank you very much for your assistance!!!
What does the 'clearing FAULTY flag . . ' mean?
Regards
Dee
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 7:39 ` keld
@ 2016-07-06 12:15 ` o1bigtenor
0 siblings, 0 replies; 10+ messages in thread
From: o1bigtenor @ 2016-07-06 12:15 UTC (permalink / raw)
To: keld; +Cc: Linux-RAID
On Wed, Jul 6, 2016 at 2:39 AM, <keld@keldix.com> wrote:
> What operating system and version are you running?
>
Running Debian testing.
Thanks for the assistance.
Dee
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 12:14 ` o1bigtenor
@ 2016-07-06 12:51 ` Wols Lists
2016-07-06 18:28 ` o1bigtenor
0 siblings, 1 reply; 10+ messages in thread
From: Wols Lists @ 2016-07-06 12:51 UTC (permalink / raw)
To: o1bigtenor, Adam Goryachev; +Cc: Linux-RAID
On 06/07/16 13:14, o1bigtenor wrote:
> On Tue, Jul 5, 2016 at 8:55 PM, Adam Goryachev
> <mailinglists@websitemanagers.com.au> wrote:
>> On 06/07/16 10:13, o1bigtenor wrote:
>>>
>>> Greetings
>>>
>>> Running a Raid 10 array with 4 - 3 TB drives. Have a UPS but this area
>>> gets significant lightning and also brownout (rural power) events.
>>>
> snip
>>>
>>> Do I just re-create the array?
>>>
>> No, not if you value your data. Only re-create the array if you are told to
>> by someone (knowledgeable) on the list.
>>
>> In your case, I think you should stop the array.
>> mdadm --stop /dev/md0
>> Make sure there is nothing listed in /proc/mdstat
>> Then try to assemble the array, but force the events to match:
>> mdadm --assemble /dev/md0 --force /dev/sd[bcef]1
>>
>> If that doesn't work, then include the output from dmesg as well as
>> /proc/mdstat and any commandline output generated.
>>
>> You might also want to examine why two drives dropped, referring to logs or
>> similar might assist.
>>
> mdadm --stop /dev/md0
> cat /proc/mdstat
> indicated no md (can't remember the exact response but it said
> nothing there)
> mdadm --assemble /dev/md0 --force /dev/sd[bcef]1 to
>
> mdadm :forcing event count in /dev/sde1(2) from 64841 to 64844
> mdadm :forcing event count in /dev/sdf1(3) from 64841 to 64844
> mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdf1
> mdadm: Marking array /dev/md0 as 'clean'
> mdadm: /dev/md0 has been started with 4 drives
>
> So my array is back up - - - thank you very much for your assistance!!!
>
But why did they drop ... are you using desktop drives? I use Seagate
Barracudas - NOT a particularly good idea. You should be using WD Red,
Seagate NAS, or similar.
"smartctl -x /dev/sdx" will give you an idea of what's going on. Search
the list for "timeout error" for an idea of the grief you'll get if
you're using desktop drives ...
If smartctl says smart is disabled, enable it. When I do, my drive comes
back (using the -x option again) saying "SCT Error Recovery not
supported". This is a no-no for a decent raid drive. I think the other
acronyms are ETL or TLS - either way you can control how the drive
reports an error back to the OS. Which is why you need proper raid
drives (the manufacturers have downgraded the firmware on desktop drives :-(
You need to fix the WHY or it could easily happen again. And this could
well be why ... (if you've had a problem on a desktop drive, it WILL
happen again, and data loss is quite likely ... even if you recover the
bulk of the drive).
Cheers,
Wol
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 12:51 ` Wols Lists
@ 2016-07-06 18:28 ` o1bigtenor
2016-07-06 21:31 ` Wols Lists
2016-07-07 2:05 ` Brad Campbell
0 siblings, 2 replies; 10+ messages in thread
From: o1bigtenor @ 2016-07-06 18:28 UTC (permalink / raw)
To: Wols Lists; +Cc: Adam Goryachev, Linux-RAID
On Wed, Jul 6, 2016 at 7:51 AM, Wols Lists <antlists@youngman.org.uk> wrote:
> On 06/07/16 13:14, o1bigtenor wrote:
>> On Tue, Jul 5, 2016 at 8:55 PM, Adam Goryachev
>> <mailinglists@websitemanagers.com.au> wrote:
>>> On 06/07/16 10:13, o1bigtenor wrote:
>>>>
>>>> Greetings
>>>>
>>>> Running a Raid 10 array with 4 - 3 TB drives. Have a UPS but this area
>>>> gets significant lightning and also brownout (rural power) events.
>>>>
>> snip
snip
>>
>> So my array is back up - - - thank you very much for your assistance!!!
>>
> But why did they drop ... are you using desktop drives? I use Seagate
> Barracudas - NOT a particularly good idea. You should be using WD Red,
> Seagate NAS, or similar.
Sorry - - - this system is 4 1 TB WD Red drives
>
> "smartctl -x /dev/sdx" will give you an idea of what's going on. Search
> the list for "timeout error" for an idea of the grief you'll get if
> you're using desktop drives ...
>
> If smartctl says smart is disabled, enable it. When I do, my drive comes
> back (using the -x option again) saying "SCT Error Recovery not
> supported". This is a no-no for a decent raid drive. I think the other
> acronyms are ETL or TLS - either way you can control how the drive
> reports an error back to the OS. Which is why you need proper raid
> drives (the manufacturers have downgraded the firmware on desktop drives :-(
>
> You need to fix the WHY or it could easily happen again. And this could
> well be why ... (if you've had a problem on a desktop drive, it WILL
> happen again, and data loss is quite likely ... even if you recover the
> bulk of the drive).
My best understanding as to the why is - - dirty power - - - fixing that means
going off-grid. Expensive and not happening any time soon although I would
really like that.
As I do not understand the error messages in smartctl I add the following
(maybe someone would explain what they mean) :
smartctl -x /dev/sdf
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.1.0-2-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD10EFRX-68FYTN0
Serial Number: WD-WCC4J4XV62F4
LU WWN Device Id: 5 0014ee 20cd9d7d1
Firmware Version: 82.00A82
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Jul 6 13:21:25 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (13320) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 152) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 139 139 021 - 4050
4 Start_Stop_Count -O--CK 100 100 000 - 23
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 100 099 000 - 423
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 6
192 Power-Off_Retract_Count -O--CK 200 200 000 - 1
193 Load_Cycle_Count -O--CK 198 198 000 - 8922
194 Temperature_Celsius -O---K 115 107 000 - 28
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1 [0] occurred at disk power-on lifetime: 395 hours (16 days + 11 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 00 18 11 28 00 40 00 Error: IDNF at LBA =
0x18112800 = 403777536
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 51 78 00 e0 00 00 18 06 38 00 40 08 5d+03:01:34.882 WRITE FPDMA QUEUED
61 50 00 00 d8 00 00 18 05 e8 00 40 08 5d+03:01:34.882 WRITE FPDMA QUEUED
61 50 00 00 d0 00 00 18 05 98 00 40 08 5d+03:01:34.882 WRITE FPDMA QUEUED
61 50 00 00 c8 00 00 18 05 48 00 40 08 5d+03:01:34.882 WRITE FPDMA QUEUED
61 50 00 00 c0 00 00 18 04 f8 00 40 08 5d+03:01:34.882 WRITE FPDMA QUEUED
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 28 Celsius
Power Cycle Min/Max Temperature: 21/28 Celsius
Lifetime Min/Max Temperature: 20/36 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (237)
Index Estimated Time Temperature Celsius
238 2016-07-06 05:24 26 *******
... ..( 34 skipped). .. *******
273 2016-07-06 05:59 26 *******
274 2016-07-06 06:00 27 ********
... ..( 8 skipped). .. ********
283 2016-07-06 06:09 27 ********
284 2016-07-06 06:10 26 *******
... ..( 3 skipped). .. *******
288 2016-07-06 06:14 26 *******
289 2016-07-06 06:15 27 ********
... ..( 42 skipped). .. ********
332 2016-07-06 06:58 27 ********
333 2016-07-06 06:59 28 *********
... ..( 18 skipped). .. *********
352 2016-07-06 07:18 28 *********
353 2016-07-06 07:19 29 **********
... ..( 3 skipped). .. **********
357 2016-07-06 07:23 29 **********
358 2016-07-06 07:24 28 *********
... ..( 29 skipped). .. *********
388 2016-07-06 07:54 28 *********
389 2016-07-06 07:55 29 **********
390 2016-07-06 07:56 28 *********
391 2016-07-06 07:57 28 *********
392 2016-07-06 07:58 29 **********
393 2016-07-06 07:59 28 *********
394 2016-07-06 08:00 28 *********
395 2016-07-06 08:01 29 **********
... ..( 4 skipped). .. **********
400 2016-07-06 08:06 29 **********
401 2016-07-06 08:07 ? -
402 2016-07-06 08:08 21 **
403 2016-07-06 08:09 21 **
404 2016-07-06 08:10 21 **
405 2016-07-06 08:11 22 ***
406 2016-07-06 08:12 22 ***
407 2016-07-06 08:13 22 ***
408 2016-07-06 08:14 24 *****
409 2016-07-06 08:15 24 *****
410 2016-07-06 08:16 23 ****
411 2016-07-06 08:17 23 ****
412 2016-07-06 08:18 23 ****
413 2016-07-06 08:19 24 *****
... ..( 2 skipped). .. *****
416 2016-07-06 08:22 24 *****
417 2016-07-06 08:23 25 ******
... ..( 3 skipped). .. ******
421 2016-07-06 08:27 25 ******
422 2016-07-06 08:28 26 *******
... ..( 60 skipped). .. *******
5 2016-07-06 09:29 26 *******
6 2016-07-06 09:30 27 ********
... ..(106 skipped). .. ********
113 2016-07-06 11:17 27 ********
114 2016-07-06 11:18 26 *******
... ..(113 skipped). .. *******
228 2016-07-06 13:12 26 *******
229 2016-07-06 13:13 27 ********
... ..( 4 skipped). .. ********
234 2016-07-06 13:18 27 ********
235 2016-07-06 13:19 26 *******
236 2016-07-06 13:20 26 *******
237 2016-07-06 13:21 26 *******
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP Log 0x04)
Page Offset Size Value Description
1 ===== = = == General Statistics (rev 2) ==
1 0x008 4 6 Lifetime Power-On Resets
1 0x010 4 423 Power-on Hours
1 0x018 6 2044877667 Logical Sectors Written
1 0x020 6 2397939 Number of Write Commands
1 0x028 6 1961443492 Logical Sectors Read
1 0x030 6 9792433 Number of Read Commands
3 ===== = = == Rotating Media Statistics (rev 1) ==
3 0x008 4 2800 Spindle Motor Power-on Hours
3 0x010 4 1582 Head Flying Hours
3 0x018 4 8924 Head Load Events
3 0x020 4 200~ Number of Reallocated Logical Sectors
3 0x028 4 0 Read Recovery Attempts
3 0x030 4 0 Number of Mechanical Start Failures
4 ===== = = == General Errors Statistics (rev 1) ==
4 0x008 4 1 Number of Reported Uncorrectable Errors
4 0x010 4 0 Resets Between Cmd Acceptance and Completion
5 ===== = = == Temperature Statistics (rev 1) ==
5 0x008 1 28 Current Temperature
5 0x010 1 27 Average Short Term Temperature
5 0x018 1 26 Average Long Term Temperature
5 0x020 1 36 Highest Temperature
5 0x028 1 20 Lowest Temperature
5 0x030 1 33 Highest Average Short Term Temperature
5 0x038 1 22 Lowest Average Short Term Temperature
5 0x040 1 27 Highest Average Long Term Temperature
5 0x048 1 25 Lowest Average Long Term Temperature
5 0x050 4 0 Time in Over-Temperature
5 0x058 1 60 Specified Maximum Operating Temperature
5 0x060 4 0 Time in Under-Temperature
5 0x068 1 0 Specified Minimum Operating Temperature
6 ===== = = == Transport Statistics (rev 1) ==
6 0x008 4 96 Number of Hardware Resets
6 0x010 4 45 Number of ASR Events
6 0x018 4 0 Number of Interface CRC Errors
|_ ~ normalized value
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 8 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 14 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 24888 Vendor specific
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 18:28 ` o1bigtenor
@ 2016-07-06 21:31 ` Wols Lists
2016-07-07 2:05 ` Brad Campbell
1 sibling, 0 replies; 10+ messages in thread
From: Wols Lists @ 2016-07-06 21:31 UTC (permalink / raw)
To: o1bigtenor; +Cc: Adam Goryachev, Linux-RAID
On 06/07/16 19:28, o1bigtenor wrote:
> SCT Error Recovery Control:
> Read: 70 (7.0 seconds)
> Write: 70 (7.0 seconds)
As soon as you said WD Red, that said the drives are good. The SCT says
the drives will wait at most 7 seconds before returning a problem, so
that's what you want (My Barracudas can't do that - a problem waiting to
happen).
I'll let someone who knows more comment on the rest of the output, but
that SCT stuff tells us your problem is not the usual one of someone
using the wrong drives.
Cheers,
Wol
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-06 18:28 ` o1bigtenor
2016-07-06 21:31 ` Wols Lists
@ 2016-07-07 2:05 ` Brad Campbell
2016-07-07 3:28 ` o1bigtenor
1 sibling, 1 reply; 10+ messages in thread
From: Brad Campbell @ 2016-07-07 2:05 UTC (permalink / raw)
To: o1bigtenor; +Cc: Linux-RAID
On 07/07/16 02:28, o1bigtenor wrote:
> My best understanding as to the why is - - dirty power - - - fixing that means
> going off-grid. Expensive and not happening any time soon although I would
> really like that.
>
Get a UPS.
Get a UPS.
Get a UPS.
Get a UPS.
I've got some nice full on-line double conversion units, but they are
noisy and less efficient. In my experience, a second hand APC SmartUPS
will sort enough of the most revolting power to keep things running
smoothly, and they are CHEAP. Despite owning several expensive UPS
units, all my stuff is behind a couple of second hand SmartUPS.
My last purchase saw me pick up 5 decent line interactive UPS units for
about $25 each as a job lot. New batteries for one were less than $100
(same brand as the UPS comes with) from the local wholesaler. I get 4-5
years out of a set of batteries.
If you had to budget for time, one blip on a RAID and the associated
recovery pays for the UPS. Cheap insurance.
Regards,
Brad
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Request for assistance
2016-07-07 2:05 ` Brad Campbell
@ 2016-07-07 3:28 ` o1bigtenor
0 siblings, 0 replies; 10+ messages in thread
From: o1bigtenor @ 2016-07-07 3:28 UTC (permalink / raw)
To: Brad Campbell; +Cc: Linux-RAID
On Wed, Jul 6, 2016 at 9:05 PM, Brad Campbell <lists2009@fnarfbargle.com> wrote:
> On 07/07/16 02:28, o1bigtenor wrote:
>
>> My best understanding as to the why is - - dirty power - - - fixing that
>> means
>> going off-grid. Expensive and not happening any time soon although I would
>> really like that.
>>
>
> Get a UPS.
> Get a UPS.
> Get a UPS.
> Get a UPS.
Hmmmmmmmmmmm - - - got one. Working on getting a bigger one setup as
maybe the first one isn't big enough. Have also found out that voltage
spikes destroy surge protectors and not necessarily all at once - - that
they die with each 'use'. It is frustrating to have such 'dirty' power. Even
better is that the CSA standards for voltage are so sloppy that electronics
die early (and often) when you are in rural country.
>
> I've got some nice full on-line double conversion units, but they are noisy
> and less efficient. In my experience, a second hand APC SmartUPS will sort
> enough of the most revolting power to keep things running smoothly, and they
> are CHEAP. Despite owning several expensive UPS units, all my stuff is
> behind a couple of second hand SmartUPS.
>
> My last purchase saw me pick up 5 decent line interactive UPS units for
> about $25 each as a job lot. New batteries for one were less than $100 (same
> brand as the UPS comes with) from the local wholesaler. I get 4-5 years out
> of a set of batteries.
>
> If you had to budget for time, one blip on a RAID and the associated
> recovery pays for the UPS. Cheap insurance.
>
Working on more. Haven't found too many of those 'reasonable' upses
though.
Regards
Dee
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-07-07 3:28 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-06 0:13 Request for assistance o1bigtenor
2016-07-06 1:55 ` Adam Goryachev
2016-07-06 12:14 ` o1bigtenor
2016-07-06 12:51 ` Wols Lists
2016-07-06 18:28 ` o1bigtenor
2016-07-06 21:31 ` Wols Lists
2016-07-07 2:05 ` Brad Campbell
2016-07-07 3:28 ` o1bigtenor
2016-07-06 7:39 ` keld
2016-07-06 12:15 ` o1bigtenor
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).