All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: James Pharaoh <james@wellbehavedsoftware.com>
Cc: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	Mark Fasheh <mfasheh@versity.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Announcing btrfs-dedupe
Date: Mon, 14 Nov 2016 16:22:07 -0500	[thread overview]
Message-ID: <20161114212206.GK21290@hungrycats.org> (raw)
In-Reply-To: <ab787485-c6ba-cada-07c5-f226d1c395e2@wellbehavedsoftware.com>

[-- Attachment #1: Type: text/plain, Size: 3411 bytes --]

On Mon, Nov 14, 2016 at 09:07:51PM +0100, James Pharaoh wrote:
> On 14/11/16 20:51, Zygo Blaxell wrote:
> >On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
> >>On 2016-11-14 13:22, James Pharaoh wrote:
> >>>One thing I am keen to understand is if BTRFS will automatically ignore
> >>>a request to deduplicate a file if it is already deduplicated? Given the
> >>>performance I see when doing a repeat deduplication, it seems to me that
> >>>it can't be doing so, although this could be caused by the CPU usage you
> >>>mention above.
> >>
> >>What's happening is that the dedupe ioctl does a byte-wise comparison of the
> >>ranges to make sure they're the same before linking them.  This is actually
> >>what takes most of the time when calling the ioctl, and is part of why it
> >>takes longer the larger the range to deduplicate is.  In essence, it's
> >>behaving like an OS should and not trusting userspace to make reasonable
> >>requests (which is also why there's a separate ioctl to clone a range from
> >>another file instead of deduplicating existing data).
> >
> >	- the extent-same ioctl could check to see which extents
> >	are referenced by the src and dst ranges, and return success
> >	immediately without reading data if they are the same (but
> >	userspace should already know this, or it's wasting a huge amount
> >	of time before it even calls the kernel).
> 
> Yes, this is what I am talking about. I believe I should be able to read
> data about the BTRFS data structures and determine if this is the case. I
> don't care if there are false matches, due to concurrent updates, but
> there'll be a /lot/ of repeat deduplications unless I do this, because even
> if the file is identical, the mtime etc hasn't changed, and I have a record
> of previously doing a dedupe, there's no guarantee that the file hasn't been
> rewritten in place (eg by rsync), and no way that I know of to reliably
> detect if a file has been changed.
> 
> I am sure there are libraries out there which can look into the data
> structures of a BTRFS file system, I haven't researched this in detail
> though. I imagine that with some kind of lock on a BTRFS root, this could be
> achieved by simply reading the data from the disk, since I believe that
> everything is copy-on-write, so no existing data should be overwritten until
> all roots referring to it are updated. Perhaps I'm missing something
> though...

FIEMAP (VFS) and SEARCH_V2 (btrfs-specific) will both give you access
to the underlying physical block numbers.  SEARCH_V2 is non-trivial
to use without reverse-engineering significant parts of btrfs-progs.
SEARCH_V2 is a generic tree-searching tool which will give you all kinds
of information about btrfs structures...it's essential for a sophisticated
deduplicator and overkill for a simple one.

For full-file dedup using FIEMAP you only need to look at the "physical"
field of the first extent (if it's zero or the same as the other file, the
files cannot be deduplicated or are already deduplicated, respectively).
The source for 'filefrag' (from e2fsprogs) is good for learning how
FIEMAP works.

For block-level dedup you need to look at each extent individually.
That's much slower and full of additional caveats.  If you're going down
that road it's probably better to just improve duperemove instead.

> James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2016-11-14 21:23 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
2016-11-07 14:02 ` David Sterba
2016-11-07 17:48   ` Mark Fasheh
2016-11-07 20:54     ` Adam Borowski
2016-11-08  2:17       ` Darrick J. Wong
2016-11-08 18:59         ` Mark Fasheh
2016-11-08 19:47           ` Darrick J. Wong
2016-11-08 19:47             ` [Ocfs2-devel] " Darrick J. Wong
2016-11-09 15:02       ` David Sterba
2016-11-08  2:40   ` Christoph Anton Mitterer
2016-11-08  6:11     ` James Pharaoh
2016-11-08 13:26     ` Austin S. Hemmelgarn
2016-11-08 16:57       ` Darrick J. Wong
2016-11-08 17:04         ` Austin S. Hemmelgarn
2016-11-08 18:49     ` Mark Fasheh
2016-11-07 17:59 ` Mark Fasheh
2016-11-07 18:49   ` James Pharaoh
2016-11-07 18:53     ` James Pharaoh
2016-11-14 18:07     ` Zygo Blaxell
2016-11-14 18:22       ` James Pharaoh
2016-11-14 18:39         ` Austin S. Hemmelgarn
2016-11-14 19:51           ` Zygo Blaxell
2016-11-14 19:56             ` Austin S. Hemmelgarn
2016-11-14 21:10               ` Zygo Blaxell
2016-11-15 12:26                 ` Austin S. Hemmelgarn
2016-11-15 17:52                   ` Zygo Blaxell
2016-11-16 22:24                     ` Niccolò Belli
2016-11-17  3:01                       ` Zygo Blaxell
2016-11-18 10:36                         ` Niccolò Belli
2016-11-14 20:07             ` James Pharaoh
2016-11-14 21:22               ` Zygo Blaxell [this message]
2016-11-14 18:43         ` Zygo Blaxell
2016-11-08 11:06 ` Niccolò Belli
2016-11-08 11:38   ` James Pharaoh
2016-11-08 16:57     ` Niccolò Belli
2016-11-08 16:58       ` James Pharaoh
2016-11-08 17:08         ` Niccolò Belli
2016-11-14 18:27   ` Zygo Blaxell
2016-11-08 22:36 ` Saint Germain
2016-11-09 11:24   ` Niccolò Belli
2016-11-09 12:47     ` Saint Germain
2016-11-13 12:45   ` James Pharaoh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161114212206.GK21290@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=ahferroin7@gmail.com \
    --cc=james@wellbehavedsoftware.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@versity.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.