linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugo Mills <hugo@carfax.org.uk>
To: Timofey Titovets <nefelim4ag@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Btrfs duperemove corrupt data while dedup
Date: Wed, 26 Aug 2015 20:00:05 +0000	[thread overview]
Message-ID: <20150826200005.GF23769@carfax.org.uk> (raw)
In-Reply-To: <CAGqmi76J4XWv+ureSG66Da_E-xq3MqGahbhsaTR6Sf2rSHd4Ug@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2609 bytes --]

On Wed, Aug 26, 2015 at 10:33:38PM +0300, Timofey Titovets wrote:
> Hello guys,
> i like btrfs, and i want put it in production soon,
> one of the feature that i want use, is a deduplication.
> 
> i frequently testing duperemove on btrfs and already see this problem before.
> i know what btrfs before, change mtime while deduping, but after dedup
> fixes from Mark (https://github.com/markfasheh), i've try to get
> checksums.
> 
> As i know duperemove use kernel ioctl for deduping, i.e. it's not a
> duperemove issue, kernel must keep data consistent.
> 
> File system is fresh and btrfs check not show any metadata corruption.
> 
> Github issue:
> https://github.com/markfasheh/duperemove/issues/91
> 
> System info:
> $ uname -a
> Linux titovetst-beplan 4.2.0-rc8-next-20150825-0959-ARCH #1 SMP Wed
> Aug 26 10:27:18 MSK 2015 x86_64 GNU/Linux
> 
> Mount options:
> rw,relatime,compress=lzo,space_cache,subvolid=257,subvol=/@home
> 
> Okay, how i find it:
> 
> md5sum_recursive(){
>         find $@ -type f -exec md5sum {} \;
> }
> 
> cp -av --reflink=always ~/<src> ~/<dest>
> md5sum_recursive ~/<dest> > ~/dedup.before
> duperemove -vhrdb 8k ~/<dest>
> md5sum_recursive ~/<dest> > ~/dedup.after
> diff -up ~/dedup.before ~/dedup.after
> 
> what i've got (full diff in attach):
> --- /home/nefelim4ag/dedup.after        2015-08-26 21:36:55.773452558 +0300
> +++ /home/nefelim4ag/dedup.before       2015-08-26 21:21:01.203600761 +0300
> @@ -25139,9 +25139,9 @@ caf9d41036e46b85d90a9541e8bc9ce1  /home/
> ....
> -0ccbc9c81a51f59dcf2ac0d102de37cb
> /home/nefelim4ag/L4D2/left4dead2/pak01_003.vpk
> +e665b502ee977dc1c619ecbd415c91b8
> /home/nefelim4ag/L4D2/left4dead2/pak01_000.vpk
> ....

   Note that these are two different files, and would therefore be
expected to have different checksums. My guess would be that the order
of enumeration for the find is different in some way, and you should
sort the output before comparing it.

   Hugo.

> Files sizes not changed and it's > 1MB.
> 
> Every time i've get a random data corruption.
> Only dependencies what i've find it is what smallest block -> more
> corruptions and vise versa, i.e. more data deduped -> more corrupted.
> 
> Smart of the disk, it's not looks, like damaged. (attach)
> 
> What i can provide to help fix this issue?
> If it's needed, i can recompile kernel with some parameters if it can
> help, of course.

-- 
Hugo Mills             | Something must be done!
hugo@... carfax.org.uk | This is something.
http://carfax.org.uk/  | Therefore we will do it!
PGP: E2AB1DE4          |                                  Management syllogism

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

  parent reply	other threads:[~2015-08-26 20:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-26 19:33 Btrfs duperemove corrupt data while dedup Timofey Titovets
2015-08-26 19:52 ` Roman Mamedov
2015-08-26 20:00 ` Hugo Mills [this message]
2015-09-29 12:38 ` Timofey Titovets
2015-09-29 12:49   ` Filipe Manana
2015-09-29 14:53     ` Timofey Titovets

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150826200005.GF23769@carfax.org.uk \
    --to=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nefelim4ag@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).