linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Btrfs duperemove corrupt data while dedup
@ 2015-08-26 19:33 Timofey Titovets
  2015-08-26 19:52 ` Roman Mamedov
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Timofey Titovets @ 2015-08-26 19:33 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]

Hello guys,
i like btrfs, and i want put it in production soon,
one of the feature that i want use, is a deduplication.

i frequently testing duperemove on btrfs and already see this problem before.
i know what btrfs before, change mtime while deduping, but after dedup
fixes from Mark (https://github.com/markfasheh), i've try to get
checksums.

As i know duperemove use kernel ioctl for deduping, i.e. it's not a
duperemove issue, kernel must keep data consistent.

File system is fresh and btrfs check not show any metadata corruption.

Github issue:
https://github.com/markfasheh/duperemove/issues/91

System info:
$ uname -a
Linux titovetst-beplan 4.2.0-rc8-next-20150825-0959-ARCH #1 SMP Wed
Aug 26 10:27:18 MSK 2015 x86_64 GNU/Linux

Mount options:
rw,relatime,compress=lzo,space_cache,subvolid=257,subvol=/@home

Okay, how i find it:

md5sum_recursive(){
        find $@ -type f -exec md5sum {} \;
}

cp -av --reflink=always ~/<src> ~/<dest>
md5sum_recursive ~/<dest> > ~/dedup.before
duperemove -vhrdb 8k ~/<dest>
md5sum_recursive ~/<dest> > ~/dedup.after
diff -up ~/dedup.before ~/dedup.after

what i've got (full diff in attach):
--- /home/nefelim4ag/dedup.after        2015-08-26 21:36:55.773452558 +0300
+++ /home/nefelim4ag/dedup.before       2015-08-26 21:21:01.203600761 +0300
@@ -25139,9 +25139,9 @@ caf9d41036e46b85d90a9541e8bc9ce1  /home/
....
-0ccbc9c81a51f59dcf2ac0d102de37cb
/home/nefelim4ag/L4D2/left4dead2/pak01_003.vpk
+e665b502ee977dc1c619ecbd415c91b8
/home/nefelim4ag/L4D2/left4dead2/pak01_000.vpk
....

Files sizes not changed and it's > 1MB.

Every time i've get a random data corruption.
Only dependencies what i've find it is what smallest block -> more
corruptions and vise versa, i.e. more data deduped -> more corrupted.

Smart of the disk, it's not looks, like damaged. (attach)

What i can provide to help fix this issue?
If it's needed, i can recompile kernel with some parameters if it can
help, of course.

Thanks.

-- 
Have a nice day,
Timofey.

[-- Attachment #2: diff.dedup --]
[-- Type: application/octet-stream, Size: 2876 bytes --]

--- /home/nefelim4ag/dedup.after	2015-08-26 21:36:55.773452558 +0300
+++ /home/nefelim4ag/dedup.before	2015-08-26 21:21:01.203600761 +0300
@@ -25139,9 +25139,9 @@ caf9d41036e46b85d90a9541e8bc9ce1  /home/
 4352d88a78aa39750bf70cd6f27bcaa5  /home/nefelim4ag/L4D2/left4dead2/voice_ban.dt
 b087edd07ed2d6026c38f94fdc1ffcf2  /home/nefelim4ag/L4D2/left4dead2/whitelist.cfg
 b54b2d3b8367355646efc29bcd8650a0  /home/nefelim4ag/L4D2/left4dead2/pak01_007.vpk
-8af99cecdddd56377cb5f991fbfbd5ae  /home/nefelim4ag/L4D2/left4dead2/pak01_000.vpk
-ed92b8a5b011129e663b708623b442af  /home/nefelim4ag/L4D2/left4dead2/pak01_002.vpk
-0ccbc9c81a51f59dcf2ac0d102de37cb  /home/nefelim4ag/L4D2/left4dead2/pak01_003.vpk
+e665b502ee977dc1c619ecbd415c91b8  /home/nefelim4ag/L4D2/left4dead2/pak01_000.vpk
+e0339477add6885e57931f801b1cff0c  /home/nefelim4ag/L4D2/left4dead2/pak01_002.vpk
+9354ddf40cc2ca3e1e57309b010627ee  /home/nefelim4ag/L4D2/left4dead2/pak01_003.vpk
 7b97251e71742bd523a1d3ae1c7073c8  /home/nefelim4ag/L4D2/left4dead2/pak01_004.vpk
 db2e6d8c99592c17b411edc14bd43cda  /home/nefelim4ag/L4D2/left4dead2_dlc1/cfg/screenshots.cfg
 543183afe7d1f5c76b046cf6d0edc40e  /home/nefelim4ag/L4D2/left4dead2_dlc1/cfg/screenshots_undo.cfg
@@ -29831,7 +29831,7 @@ f2a33ee16390509558a4119811bca224  /home/
 d038c122fe57ba03c24571fcf4c1c753  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m1_docks.bsp
 9e79670b739db0693bd0e17b352d0e58  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m1_docks.nav
 abe1aba71611f5a90e8e1441053d3592  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m1_docks_exclude.lst
-8f828f72730b556a90f7817d51d4b3db  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m2_barge.bsp
+b8b0980b4dbc2ec7350cc8d73ba60f5d  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m2_barge.bsp
 84c8da933597fc2fd6023c32a8e68ab3  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m2_barge.nav
 09e2291041d4a30beca397776a448a26  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m2_barge_exclude.lst
 df013e5dd68891e6e0964c57b2040c32  /home/nefelim4ag/L4D2/left4dead2_dlc2/maps/c7m3_port.bsp
@@ -41686,7 +41686,7 @@ e693643d17b6adb85cc2bf7c67e29594  /home/
 2766afa985df82882ab3c9147b967dbd  /home/nefelim4ag/L4D2/left4dead2_dlc2/pak01_002.vpk
 3925cdccc66d3147c1dfc9caff491d3c  /home/nefelim4ag/L4D2/left4dead2_dlc2/pak01_dir.vpk
 8ac2da99d8ac76f15788a62f32801d73  /home/nefelim4ag/L4D2/left4dead2_dlc2/pak01_000.vpk
-1839f37b747e21d82c81a8907eaa64a6  /home/nefelim4ag/L4D2/left4dead2_dlc2/pak01_001.vpk
+e3312c063ca7569f37ba3d7b062fa7ec  /home/nefelim4ag/L4D2/left4dead2_dlc2/pak01_001.vpk
 65685fe4458c796261bca504a4d4d899  /home/nefelim4ag/L4D2/left4dead2_dlc3/maps/soundcache/c10m1_caves.manifest
 84f140f4e1a43b5f0d3a50898014fef5  /home/nefelim4ag/L4D2/left4dead2_dlc3/maps/soundcache/c10m2_drainage.manifest
 d46e7e69b1c704bdec758c421d64d4fa  /home/nefelim4ag/L4D2/left4dead2_dlc3/maps/soundcache/c10m3_ranchhouse.manifest

[-- Attachment #3: smart.log --]
[-- Type: text/x-log, Size: 10065 bytes --]

smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.0-rc8-next-20150825-0959-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue Mobile
Device Model:     WDC WD10JPCX-24UE4T0
Serial Number:    WD-WX61AC3J6551
LU WWN Device Id: 5 0014ee 6599e2c1a
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Aug 26 22:28:49 2015 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(18480) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 207) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x7035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   182   178   021    -    1883
  4 Start_Stop_Count        -O--CK   095   095   000    -    5413
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         POSR-K   200   200   051    -    0
  9 Power_On_Hours          -O--CK   091   091   000    -    7153
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   099   099   000    -    1340
192 Power-Off_Retract_Count -O--CK   200   200   000    -    190
193 Load_Cycle_Count        -O--CK   191   191   000    -    28327
194 Temperature_Celsius     -O---K   093   085   000    -    54
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   100   253   000    -    0
240 Head_Flying_Hours       -O--CK   091   091   000    -    6968
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb7       GPL,SL  VS      38  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               70%        78         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    54 Celsius
Power Cycle Min/Max Temperature:     30/55 Celsius
Lifetime    Min/Max Temperature:     17/62 Celsius
Lifetime    Average Temperature:        37 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    128 (37)

Index    Estimated Time   Temperature Celsius
  38    2015-08-26 20:21    46  ***************************
  39    2015-08-26 20:22    46  ***************************
  40    2015-08-26 20:23    46  ***************************
  41    2015-08-26 20:24    47  ****************************
 ...    ..(  5 skipped).    ..  ****************************
  47    2015-08-26 20:30    47  ****************************
  48    2015-08-26 20:31    48  *****************************
 ...    ..(  6 skipped).    ..  *****************************
  55    2015-08-26 20:38    48  *****************************
  56    2015-08-26 20:39    49  ******************************
 ...    ..(  4 skipped).    ..  ******************************
  61    2015-08-26 20:44    49  ******************************
  62    2015-08-26 20:45    50  *******************************
 ...    ..( 17 skipped).    ..  *******************************
  80    2015-08-26 21:03    50  *******************************
  81    2015-08-26 21:04    51  ********************************
 ...    ..( 11 skipped).    ..  ********************************
  93    2015-08-26 21:16    51  ********************************
  94    2015-08-26 21:17    52  *********************************
  95    2015-08-26 21:18    52  *********************************
  96    2015-08-26 21:19    52  *********************************
  97    2015-08-26 21:20    53  **********************************
 ...    ..(  2 skipped).    ..  **********************************
 100    2015-08-26 21:23    53  **********************************
 101    2015-08-26 21:24    54  ***********************************
 ...    ..(  9 skipped).    ..  ***********************************
 111    2015-08-26 21:34    54  ***********************************
 112    2015-08-26 21:35    55  ************************************
 ...    ..( 15 skipped).    ..  ************************************
   0    2015-08-26 21:51    55  ************************************
   1    2015-08-26 21:52    54  ***********************************
 ...    ..( 35 skipped).    ..  ***********************************
  37    2015-08-26 22:28    54  ***********************************

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           31  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            1  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4         9736  Vendor specific


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-09-29 14:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-26 19:33 Btrfs duperemove corrupt data while dedup Timofey Titovets
2015-08-26 19:52 ` Roman Mamedov
2015-08-26 20:00 ` Hugo Mills
2015-09-29 12:38 ` Timofey Titovets
2015-09-29 12:49   ` Filipe Manana
2015-09-29 14:53     ` Timofey Titovets

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).