Re: Kernel bug during RAID1 replace

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Saint Germain <saintger@gmail.com>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Kernel bug during RAID1 replace
Date: Tue, 28 Jun 2016 01:03:39 +0200	[thread overview]
Message-ID: <20160628010339.7fcddf5e@system> (raw)
In-Reply-To: <CAJCQCtR3FQ-Pe40dOcKGgd+8paraHuCoNCw5Kk+2G7bOOJF1rw@mail.gmail.com>

On Mon, 27 Jun 2016 16:55:07 -0600, Chris Murphy
<lists@colorremedies.com> wrote :

> On Mon, Jun 27, 2016 at 4:26 PM, Saint Germain <saintger@gmail.com>
> wrote:
> 
> >>
> >
> > Thanks for your help.
> >
> > Ok here is the log from the mounting, and including btrfs replace
> > (btrfs replace start -f /dev/sda1 /dev/sdd1 /home):
> >
> > BTRFS info (device sdb1): disk space caching is enabled
> > BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 12,
> > flush 7928, corrupt 1705631, gen 1335 BTRFS info (device sdb1):
> > bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 14220, gen 24
> 
> Eek. So sdb has 11+ million write errors, flush errors, read errors,
> and over 1 million corruptions. It's dying or dead.
> 
> And sda has a dozen thousand+ corruptions. This isn't a good
> combination, as you have two devices with problems and raid5 only
> protects you from one device with problems.
> 
> You were in the process of replacing sda, which is good, but it may
> not be enough...
> 
> 
> > BTRFS info (device sdb1): dev_replace from /dev/sda1 (devid 1)
> > to /dev/sdd1 started scrub_handle_errored_block: 166 callbacks
> > suppressed BTRFS warning (device sdb1): checksum error at logical
> > 93445255168 on dev /dev/sda1, sector 77669048, root 5, inode
> > 3434831, offset 479232, length 4096, links 1 (path:
> > user/.local/share/zeitgeist/activity.sqlite-wal)
> > btrfs_dev_stat_print_on_error: 166 callbacks suppressed BTRFS error
> > (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt
> > 14221, gen 24 scrub_handle_errored_block: 166 callbacks suppressed
> > BTRFS error (device sdb1): unable to fixup (regular) error at
> > logical 93445255168 on dev /dev/sda1
> 
> Shoot. You have a lot of these. It looks suspiciously like you're
> hitting a case list regulars are only just starting to understand
> (somewhat) where it's possible to have a legit corrupt sector that
> Btrfs detects during scrub as wrong, fixes it from parity, but then
> occasionally wrongly overwrites the parity with bad parity. This
> doesn't cause an immediately recognizable problem. But if the volume
> becomes degraded later, Btrfs must use parity to reconstruct
> on-the-fly and if it hits one of these bad parities, the
> reconstruction is bad, and ends up causing lots of these checksum
> errors. We can tell it's not metadata corruption because a.) there's a
> file listed as being affected and b.) the file system doesn't fail and
> go read only. But still it means those files are likely toast...
> 
> 
> [...snip many instances of checksum errors...]
> 
> > BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush
> > 0, corrupt 16217, gen 24 ata2.00: exception Emask 0x0 SAct 0x4000
> > SErr 0x0 action 0x0 ata2.00: irq_stat 0x40000008
> > ata2.00: failed command: READ FPDMA QUEUED
> > ata2.00: cmd 60/08:70:08:d8:70/00:00:0f:00:00/40 tag 14 ncq 4096 in
> >          res 41/40:00:08:d8:70/00:00:0f:00:00/40 Emask 0x409 (media
> > error) <F> ata2.00: status: { DRDY ERR }
> > ata2.00: error: { UNC }
> > ata2.00: configured for UDMA/133
> > sd 1:0:0:0: [sdb] tag#14 FAILED Result: hostbyte=DID_OK
> > driverbyte=DRIVER_SENSE sd 1:0:0:0: [sdb] tag#14 Sense Key : Medium
> > Error [current] [descriptor] sd 1:0:0:0: [sdb] tag#14 Add. Sense:
> > Unrecovered read error - auto reallocate failed sd 1:0:0:0: [sdb]
> > tag#14 CDB: Read(10) 28 00 0f 70 d8 08 00 00 08 00
> > blk_update_request: I/O error, dev sdb, sector 259053576
> 
> OK yeah so bad sector on sdb. So you have two failures because sda is
> already giving you trouble while being replaced and on top of it you
> now get a 2nd (partial) failure via bad sectors.
> 
> So rather urgently I think you need to copy things off this volume if
> you don't already have a backup so you can save as much as possible.
> Don't write to the drives. You might even consider 'mount -o
> remount,ro' to avoid anything writing to the volume. Copy the most
> important data first, triage time.
> 
> While that happens you can safely collect some more information:
> 
> btrfs fi us <mp>
> smartctl -x <dev>   ## for both drives
> 

Ok thanks I will begin to make an image with dd.
Do you recommend to use sda or sdb ?

In the meantime here are the info requested:

btrfs fi us /home
Overall:
    Device size:                   3.63TiB
    Device allocated:              2.76TiB
    Device unallocated:          888.51GiB
    Device missing:                  0.00B
    Used:                          2.62TiB
    Free (estimated):            517.56GiB      (min: 517.56GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:1.38TiB, Used:1.31TiB
   /dev/sda1       1.38TiB
   /dev/sdb1       1.38TiB

Metadata,RAID1: Size:5.00GiB, Used:3.15GiB
   /dev/sda1       5.00GiB
   /dev/sdb1       5.00GiB

System,RAID1: Size:64.00MiB, Used:216.00KiB
   /dev/sda1      64.00MiB
   /dev/sdb1      64.00MiB

Unallocated:
   /dev/sda1     444.26GiB
   /dev/sdb1     444.26GiB

root@system:/# smartctl -x /dev/sda
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-0.bpo.1-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST2000LM003 HN-M201RAD
Serial Number:    S377J9DGB01499
LU WWN Device Id: 5 0004cf 2111fa028
Firmware Version: 2BE10001
User Capacity:    2 000 398 934 016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Jun 28 00:59:36 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     254 (maximum performance), recommended: 254
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (  38)	The self-test routine was interrupted
					by the host with a hard or soft reset.
Total time to complete Offline 
data collection: 		(23100) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 385) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   051    -    0
  2 Throughput_Performance  -OS--K   252   252   000    -    0
  3 Spin_Up_Time            PO---K   091   090   025    -    2993
  4 Start_Stop_Count        -O--CK   100   100   000    -    661
  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
  7 Seek_Error_Rate         -OSR-K   252   252   051    -    0
  8 Seek_Time_Performance   --S--K   252   252   015    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    1379
 10 Spin_Retry_Count        -O--CK   252   252   051    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    349
191 G-Sense_Error_Rate      -O---K   252   252   000    -    0
192 Power-Off_Retract_Count -O---K   252   252   000    -    0
194 Temperature_Celsius     -O----   060   047   000    -    40 (Min/Max 18/53)
195 Hardware_ECC_Recovered  -O-RCK   100   100   000    -    0
196 Reallocated_Event_Count -O--CK   252   252   000    -    0
197 Current_Pending_Sector  -O--CK   252   252   000    -    0
198 Offline_Uncorrectable   ----CK   252   252   000    -    0
199 UDMA_CRC_Error_Count    -OS-CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   -O-R-K   100   100   000    -    2
223 Load_Retry_Count        -O--CK   100   100   000    -    1
225 Load_Cycle_Count        -O--CK   099   099   000    -    10744
241 Total_LBAs_Written      -O--CK   095   094   000    -    7981553
242 Total_LBAs_Read         -O--CK   098   094   000    -    4015781
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      2  Comprehensive SMART error log
0x03       GPL     R/O      2  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      2  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xc0-0xdf  GPL,SL  VS      16  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (2 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Offline             Interrupted (host reset)      60%      1372         -
# 2  Short captive       Self-test routine in progress 60%      1372         -
# 3  Offline             Interrupted (host reset)      60%      1372         -
# 4  Short captive       Self-test routine in progress 60%      1372         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Interrupted [60% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  2
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    40 Celsius
Power Cycle Min/Max Temperature:     23/45 Celsius
Lifetime    Min/Max Temperature:     18/53 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         5 minutes
Temperature Logging Interval:        5 minutes
Min/Max recommended Temperature:     -5/80 Celsius
Min/Max Temperature Limit:           -10/85 Celsius
Temperature History Size (Index):    128 (60)

Index    Estimated Time   Temperature Celsius
  61    2016-06-27 14:20    40  *********************
  62    2016-06-27 14:25    37  ******************
  63    2016-06-27 14:30    37  ******************
  64    2016-06-27 14:35    37  ******************
  65    2016-06-27 14:40    38  *******************
  66    2016-06-27 14:45    38  *******************
  67    2016-06-27 14:50    39  ********************
  68    2016-06-27 14:55    40  *********************
  69    2016-06-27 15:00    40  *********************
  70    2016-06-27 15:05    41  **********************
  71    2016-06-27 15:10    41  **********************
  72    2016-06-27 15:15    26  *******
  73    2016-06-27 15:20    28  *********
  74    2016-06-27 15:25    30  ***********
  75    2016-06-27 15:30    32  *************
  76    2016-06-27 15:35    34  ***************
  77    2016-06-27 15:40    34  ***************
  78    2016-06-27 15:45    35  ****************
  79    2016-06-27 15:50    36  *****************
  80    2016-06-27 15:55    37  ******************
  81    2016-06-27 16:00    38  *******************
  82    2016-06-27 16:05    39  ********************
  83    2016-06-27 16:10    26  *******
  84    2016-06-27 16:15    29  **********
  85    2016-06-27 16:20    30  ***********
  86    2016-06-27 16:25    32  *************
  87    2016-06-27 16:30    35  ****************
  88    2016-06-27 16:35    37  ******************
  89    2016-06-27 16:40    39  ********************
  90    2016-06-27 16:45    41  **********************
  91    2016-06-27 16:50    43  ************************
  92    2016-06-27 16:55    45  **************************
  93    2016-06-27 17:00    46  ***************************
  94    2016-06-27 17:05    47  ****************************
  95    2016-06-27 17:10    47  ****************************
  96    2016-06-27 17:15    48  *****************************
  97    2016-06-27 17:20    48  *****************************
  98    2016-06-27 17:25    48  *****************************
  99    2016-06-27 17:30    49  ******************************
 ...    ..(  3 skipped).    ..  ******************************
 103    2016-06-27 17:50    49  ******************************
 104    2016-06-27 17:55    44  *************************
 105    2016-06-27 18:00    43  ************************
 106    2016-06-27 18:05    42  ***********************
 107    2016-06-27 18:10    44  *************************
 108    2016-06-27 18:15    45  **************************
 109    2016-06-27 18:20    45  **************************
 110    2016-06-27 18:25    44  *************************
 111    2016-06-27 18:30    41  **********************
 112    2016-06-27 18:35    40  *********************
 113    2016-06-27 18:40    39  ********************
 114    2016-06-27 18:45    39  ********************
 115    2016-06-27 18:50    38  *******************
 ...    ..(  2 skipped).    ..  *******************
 118    2016-06-27 19:05    38  *******************
 119    2016-06-27 19:10    41  **********************
 120    2016-06-27 19:15    43  ************************
 121    2016-06-27 19:20    43  ************************
 122    2016-06-27 19:25    40  *********************
 123    2016-06-27 19:30    39  ********************
 124    2016-06-27 19:35    39  ********************
 125    2016-06-27 19:40    39  ********************
 126    2016-06-27 19:45    40  *********************
 ...    ..(  5 skipped).    ..  *********************
   4    2016-06-27 20:15    40  *********************
   5    2016-06-27 20:20    39  ********************
 ...    ..(  3 skipped).    ..  ********************
   9    2016-06-27 20:40    39  ********************
  10    2016-06-27 20:45    38  *******************
  11    2016-06-27 20:50    38  *******************
  12    2016-06-27 20:55    41  **********************
  13    2016-06-27 21:00    44  *************************
  14    2016-06-27 21:05    44  *************************
  15    2016-06-27 21:10    41  **********************
  16    2016-06-27 21:15    40  *********************
  17    2016-06-27 21:20    40  *********************
  18    2016-06-27 21:25    43  ************************
  19    2016-06-27 21:30    43  ************************
  20    2016-06-27 21:35    27  ********
  21    2016-06-27 21:40    29  **********
  22    2016-06-27 21:45    30  ***********
  23    2016-06-27 21:50    34  ***************
  24    2016-06-27 21:55    35  ****************
  25    2016-06-27 22:00    36  *****************
  26    2016-06-27 22:05    36  *****************
  27    2016-06-27 22:10    37  ******************
 ...    ..(  3 skipped).    ..  ******************
  31    2016-06-27 22:30    37  ******************
  32    2016-06-27 22:35    38  *******************
  33    2016-06-27 22:40    38  *******************
  34    2016-06-27 22:45    41  **********************
  35    2016-06-27 22:50    43  ************************
  36    2016-06-27 22:55    41  **********************
  37    2016-06-27 23:00    39  ********************
  38    2016-06-27 23:05    39  ********************
  39    2016-06-27 23:10    39  ********************
  40    2016-06-27 23:15    40  *********************
  41    2016-06-27 23:20    43  ************************
  42    2016-06-27 23:25    44  *************************
  43    2016-06-27 23:30    45  **************************
  44    2016-06-27 23:35    42  ***********************
  45    2016-06-27 23:40    40  *********************
  46    2016-06-27 23:45    39  ********************
  47    2016-06-27 23:50    39  ********************
  48    2016-06-27 23:55    38  *******************
 ...    ..(  2 skipped).    ..  *******************
  51    2016-06-28 00:10    38  *******************
  52    2016-06-28 00:15    41  **********************
  53    2016-06-28 00:20    43  ************************
  54    2016-06-28 00:25    44  *************************
  55    2016-06-28 00:30    41  **********************
  56    2016-06-28 00:35    40  *********************
 ...    ..(  3 skipped).    ..  *********************
  60    2016-06-28 00:55    40  *********************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0003  4            0  R_ERR response for device-to-host data FIS
0x0004  4            0  R_ERR response for host-to-device data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x0006  4            0  R_ERR response for device-to-host non-data FIS
0x0007  4            0  R_ERR response for host-to-device non-data FIS
0x0008  4            0  Device-to-host non-data FIS retries
0x0009  4            5  Transition from drive PhyRdy to drive PhyNRdy
0x000a  4            5  Device-to-host register FISes sent due to a COMRESET
0x000b  4            0  CRC errors within host-to-device FIS
0x000d  4            0  Non-CRC errors within host-to-device FIS
0x000f  4            0  R_ERR response for host-to-device data FIS, CRC
0x0010  4            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  4            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  4            0  R_ERR response for host-to-device non-data FIS, non-CRC
0x8e00  4            0  Vendor specific
0x8e01  4            0  Vendor specific
0x8e02  4            0  Vendor specific
0x8e03  4            0  Vendor specific
0x8e04  4            0  Vendor specific
0x8e05  4            0  Vendor specific
0x8e06  4            0  Vendor specific
0x8e07  4            0  Vendor specific
0x8e08  4            0  Vendor specific
0x8e09  4            0  Vendor specific
0x8e0a  4            0  Vendor specific
0x8e0b  4            0  Vendor specific
0x8e0c  4            0  Vendor specific
0x8e0d  4            0  Vendor specific
0x8e0e  4            0  Vendor specific
0x8e0f  4            0  Vendor specific
0x8e10  4            0  Vendor specific
0x8e11  4            0  Vendor specific

root@system:/# smartctl -x /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-0.bpo.1-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST2000LM003 HN-M201RAD
Serial Number:    S377J9CGB01582
LU WWN Device Id: 5 0004cf 211210d5b
Firmware Version: 2BE10001
User Capacity:    2 000 398 934 016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Jun 28 00:59:37 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     254 (maximum performance), recommended: 254
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: 		(21420) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 357) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   051    -    28
  2 Throughput_Performance  -OS--K   252   252   000    -    0
  3 Spin_Up_Time            PO---K   092   083   025    -    2678
  4 Start_Stop_Count        -O--CK   100   100   000    -    575
  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
  7 Seek_Error_Rate         -OSR-K   252   252   051    -    0
  8 Seek_Time_Performance   --S--K   252   252   015    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    1391
 10 Spin_Retry_Count        -O--CK   252   252   051    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    371
191 G-Sense_Error_Rate      -O---K   252   252   000    -    0
192 Power-Off_Retract_Count -O---K   252   252   000    -    0
194 Temperature_Celsius     -O----   061   047   000    -    39 (Min/Max 19/53)
195 Hardware_ECC_Recovered  -O-RCK   100   100   000    -    0
196 Reallocated_Event_Count -O--CK   252   252   000    -    0
197 Current_Pending_Sector  -O--CK   100   100   000    -    1
198 Offline_Uncorrectable   ----CK   252   252   000    -    0
199 UDMA_CRC_Error_Count    -OS-CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   -O-R-K   100   100   000    -    3
223 Load_Retry_Count        -O--CK   100   100   000    -    1
225 Load_Cycle_Count        -O--CK   099   099   000    -    13957
241 Total_LBAs_Written      -O--CK   096   094   000    -    6153920
242 Total_LBAs_Read         -O--CK   097   094   000    -    4873960
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      2  Comprehensive SMART error log
0x03       GPL     R/O      2  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      2  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xc0-0xdf  GPL,SL  VS      16  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
Device Error Count: 28 (device log contains only the most recent 8 errors)
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 28 [3] occurred at disk power-on lifetime: 1390 hours (57 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 08 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:21.275  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d8 08 40 08     00:00:21.279  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d8 00 40 08     00:00:21.279  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d7 f8 40 08     00:00:21.279  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d7 f0 40 08     00:00:21.279  READ FPDMA QUEUED

Error 27 [2] occurred at disk power-on lifetime: 1390 hours (57 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 05 80 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 06 60 00 00 0f 70 d7 28 40 08     00:00:21.275  READ FPDMA QUEUED
  60 00 00 06 60 00 00 0f 70 d7 28 40 08     00:00:21.276  READ FPDMA QUEUED
  60 00 00 07 00 00 00 0f 70 d0 28 40 08     00:00:21.276  READ FPDMA QUEUED
  60 00 00 0a 00 00 00 0f 70 c6 28 40 08     00:00:21.276  READ FPDMA QUEUED
  60 00 00 0a 00 00 00 0f 70 bc 28 40 08     00:00:21.276  READ FPDMA QUEUED

Error 26 [1] occurred at disk power-on lifetime: 1390 hours (57 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 08 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:19.046  READ FPDMA QUEUED
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:19.051  READ FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:19.051  FLUSH CACHE EXT
  61 00 00 00 08 00 00 20 00 08 00 40 08     00:00:19.051  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 00 02 08 00 40 08     00:00:19.051  WRITE FPDMA QUEUED

Error 25 [0] occurred at disk power-on lifetime: 1388 hours (57 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 08 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:11.810  READ FPDMA QUEUED
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:11.816  READ FPDMA QUEUED
  60 00 00 00 08 00 00 07 9c d9 20 40 08     00:00:11.811  READ FPDMA QUEUED
  60 00 00 00 08 00 00 3e 60 1a 90 40 08     00:00:11.811  READ FPDMA QUEUED
  60 00 00 00 08 00 00 07 9c 68 f0 40 08     00:00:11.811  READ FPDMA QUEUED

Error 24 [7] occurred at disk power-on lifetime: 1387 hours (57 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 08 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:09.010  READ FPDMA QUEUED
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:09.066  READ FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:09.051  FLUSH CACHE EXT
  61 00 00 00 08 00 00 20 00 08 00 40 08     00:00:09.051  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 00 02 08 00 40 08     00:00:09.051  WRITE FPDMA QUEUED

Error 23 [6] occurred at disk power-on lifetime: 1385 hours (57 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 08 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:01.685  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d8 08 40 08     00:00:01.688  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d8 00 40 08     00:00:01.688  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d7 f8 40 08     00:00:01.688  READ FPDMA QUEUED
  60 00 10 00 08 00 00 0f 70 d7 f0 40 08     00:00:01.688  READ FPDMA QUEUED

Error 22 [5] occurred at disk power-on lifetime: 1385 hours (57 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 05 80 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 06 60 00 00 0f 70 d7 28 40 08     00:00:01.685  READ FPDMA QUEUED
  60 00 00 06 60 00 00 0f 70 d7 28 40 08     00:00:01.685  READ FPDMA QUEUED
  60 00 00 08 00 00 00 0f 70 cf 28 40 08     00:00:01.685  READ FPDMA QUEUED
  60 00 00 05 00 00 00 0f 70 ca 28 40 08     00:00:01.685  READ FPDMA QUEUED
  60 00 00 07 00 00 00 0f 70 c3 28 40 08     00:00:01.685  READ FPDMA QUEUED

Error 21 [4] occurred at disk power-on lifetime: 1384 hours (57 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 08 00 00 0f 70 d8 08 40 00  Error: UNC at LBA = 0x0f70d808 = 259053576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 08 00 00 0f 70 d8 08 40 08     00:00:23.754  READ FPDMA QUEUED
  60 00 00 00 08 00 00 0f 70 db 30 40 08     00:00:23.774  READ FPDMA QUEUED
  60 00 00 00 08 00 00 0f 70 d8 30 40 08     00:00:23.774  READ FPDMA QUEUED
  60 00 00 00 08 00 00 0f 70 d9 30 40 08     00:00:23.774  READ FPDMA QUEUED
  60 00 00 00 08 00 00 0f 70 da 30 40 08     00:00:23.774  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (2 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%      1384         259053576
# 2  Short captive       Completed: read failure       90%      1384         259053576

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed_read_failure [90% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  2
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    38 Celsius
Power Cycle Min/Max Temperature:     23/42 Celsius
Lifetime    Min/Max Temperature:     19/53 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         5 minutes
Temperature Logging Interval:        5 minutes
Min/Max recommended Temperature:     -5/80 Celsius
Min/Max Temperature Limit:           -10/85 Celsius
Temperature History Size (Index):    128 (17)

Index    Estimated Time   Temperature Celsius
  18    2016-06-27 14:20    38  *******************
  19    2016-06-27 14:25    36  *****************
 ...    ..( 17 skipped).    ..  *****************
  37    2016-06-27 15:55    36  *****************
  38    2016-06-27 16:00    35  ****************
  39    2016-06-27 16:05    36  *****************
  40    2016-06-27 16:10    35  ****************
  41    2016-06-27 16:15    35  ****************
  42    2016-06-27 16:20    32  *************
  43    2016-06-27 16:25    33  **************
  44    2016-06-27 16:30    34  ***************
  45    2016-06-27 16:35    34  ***************
  46    2016-06-27 16:40    35  ****************
  47    2016-06-27 16:45    27  ********
  48    2016-06-27 16:50    29  **********
  49    2016-06-27 16:55    30  ***********
  50    2016-06-27 17:00    34  ***************
  51    2016-06-27 17:05    37  ******************
  52    2016-06-27 17:10    39  ********************
  53    2016-06-27 17:15    40  *********************
  54    2016-06-27 17:20    41  **********************
  55    2016-06-27 17:25    42  ***********************
  56    2016-06-27 17:30    42  ***********************
  57    2016-06-27 17:35    43  ************************
  58    2016-06-27 17:40    43  ************************
  59    2016-06-27 17:45    44  *************************
  60    2016-06-27 17:50    44  *************************
  61    2016-06-27 17:55    44  *************************
  62    2016-06-27 18:00    41  **********************
  63    2016-06-27 18:05    40  *********************
  64    2016-06-27 18:10    39  ********************
  65    2016-06-27 18:15    41  **********************
  66    2016-06-27 18:20    42  ***********************
  67    2016-06-27 18:25    43  ************************
  68    2016-06-27 18:30    43  ************************
  69    2016-06-27 18:35    38  *******************
  70    2016-06-27 18:40    37  ******************
  71    2016-06-27 18:45    36  *****************
 ...    ..(  2 skipped).    ..  *****************
  74    2016-06-27 19:00    36  *****************
  75    2016-06-27 19:05    35  ****************
  76    2016-06-27 19:10    36  *****************
  77    2016-06-27 19:15    40  *********************
  78    2016-06-27 19:20    41  **********************
  79    2016-06-27 19:25    41  **********************
  80    2016-06-27 19:30    38  *******************
  81    2016-06-27 19:35    37  ******************
  82    2016-06-27 19:40    37  ******************
  83    2016-06-27 19:45    37  ******************
  84    2016-06-27 19:50    38  *******************
 ...    ..(  4 skipped).    ..  *******************
  89    2016-06-27 20:15    38  *******************
  90    2016-06-27 20:20    37  ******************
  91    2016-06-27 20:25    38  *******************
  92    2016-06-27 20:30    37  ******************
 ...    ..(  2 skipped).    ..  ******************
  95    2016-06-27 20:45    37  ******************
  96    2016-06-27 20:50    36  *****************
  97    2016-06-27 20:55    39  ********************
  98    2016-06-27 21:00    41  **********************
  99    2016-06-27 21:05    42  ***********************
 100    2016-06-27 21:10    42  ***********************
 101    2016-06-27 21:15    39  ********************
 102    2016-06-27 21:20    38  *******************
 103    2016-06-27 21:25    39  ********************
 104    2016-06-27 21:30    41  **********************
 105    2016-06-27 21:35    39  ********************
 106    2016-06-27 21:40    27  ********
 107    2016-06-27 21:45    36  *****************
 108    2016-06-27 21:50    35  ****************
 109    2016-06-27 21:55    34  ***************
 110    2016-06-27 22:00    34  ***************
 111    2016-06-27 22:05    35  ****************
 ...    ..(  2 skipped).    ..  ****************
 114    2016-06-27 22:20    35  ****************
 115    2016-06-27 22:25    36  *****************
 ...    ..(  4 skipped).    ..  *****************
 120    2016-06-27 22:50    36  *****************
 121    2016-06-27 22:55    37  ******************
 122    2016-06-27 23:00    39  ********************
 123    2016-06-27 23:05    37  ******************
 124    2016-06-27 23:10    37  ******************
 125    2016-06-27 23:15    38  *******************
 ...    ..(  2 skipped).    ..  *******************
   0    2016-06-27 23:30    38  *******************
   1    2016-06-27 23:35    39  ********************
   2    2016-06-27 23:40    39  ********************
   3    2016-06-27 23:45    38  *******************
   4    2016-06-27 23:50    37  ******************
   5    2016-06-27 23:55    37  ******************
   6    2016-06-28 00:00    37  ******************
   7    2016-06-28 00:05    36  *****************
   8    2016-06-28 00:10    36  *****************
   9    2016-06-28 00:15    36  *****************
  10    2016-06-28 00:20    37  ******************
  11    2016-06-28 00:25    37  ******************
  12    2016-06-28 00:30    38  *******************
  13    2016-06-28 00:35    38  *******************
  14    2016-06-28 00:40    37  ******************
  15    2016-06-28 00:45    42  ***********************
  16    2016-06-28 00:50    42  ***********************
  17    2016-06-28 00:55    39  ********************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0003  4            0  R_ERR response for device-to-host data FIS
0x0004  4            0  R_ERR response for host-to-device data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x0006  4            0  R_ERR response for device-to-host non-data FIS
0x0007  4            0  R_ERR response for host-to-device non-data FIS
0x0008  4            0  Device-to-host non-data FIS retries
0x0009  4            6  Transition from drive PhyRdy to drive PhyNRdy
0x000a  4            5  Device-to-host register FISes sent due to a COMRESET
0x000b  4            0  CRC errors within host-to-device FIS
0x000d  4            0  Non-CRC errors within host-to-device FIS
0x000f  4            0  R_ERR response for host-to-device data FIS, CRC
0x0010  4            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  4            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  4            0  R_ERR response for host-to-device non-data FIS, non-CRC
0x8e00  4            0  Vendor specific
0x8e01  4            0  Vendor specific
0x8e02  4            0  Vendor specific
0x8e03  4            0  Vendor specific
0x8e04  4            0  Vendor specific
0x8e05  4            0  Vendor specific
0x8e06  4        82711  Vendor specific
0x8e07  4     36347773  Vendor specific
0x8e08  4          144  Vendor specific
0x8e09  4        43703  Vendor specific
0x8e0a  4      2848493  Vendor specific
0x8e0b  4       193920  Vendor specific
0x8e0c  4     10284463  Vendor specific
0x8e0d  4        74795  Vendor specific
0x8e0e  4       935421  Vendor specific
0x8e0f  4    131112960  Vendor specific
0x8e10  4      3741684  Vendor specific
0x8e11  4            7  Vendor specific

next prev parent reply	other threads:[~2016-06-27 23:03 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-27 21:36 Kernel bug during RAID1 replace Saint Germain
2016-06-27 21:42 ` Chris Murphy
2016-06-27 22:26   ` Saint Germain
2016-06-27 22:55     ` Chris Murphy
2016-06-27 22:58       ` Chris Murphy
2016-06-27 23:06         ` Saint Germain
2016-06-28  0:00           ` Chris Murphy
2016-06-28  0:10             ` Chris Murphy
2016-06-28  0:49             ` Saint Germain
2016-06-28  2:14               ` Chris Murphy
2016-06-28 22:52                 ` Saint Germain
2016-06-29  4:25                   ` Chris Murphy
2016-06-29  9:50                     ` Saint Germain
2016-06-29 17:28                       ` Chris Murphy
2016-06-29 18:12                         ` Saint Germain
2016-06-29 18:19                           ` Austin S. Hemmelgarn
2016-06-29 19:02                             ` Saint Germain
2016-06-29 19:08                               ` Chris Murphy
2016-06-29 19:16                                 ` Saint Germain
2016-06-29 19:23                                   ` Hugo Mills
2016-06-29 23:51                                     ` Saint Germain
2016-06-30  0:24                                       ` Chris Murphy
2016-06-30 21:02                                         ` Saint Germain
2016-06-30  0:19                                   ` Chris Murphy
2016-06-29 17:41                       ` Saint Germain
2016-06-27 23:03       ` Saint Germain [this message]
2016-06-27 23:49         ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160628010339.7fcddf5e@system \
    --to=saintger@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).