All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
Cc: James Pharaoh <james@wellbehavedsoftware.com>,
	Mark Fasheh <mfasheh@versity.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Announcing btrfs-dedupe
Date: Mon, 14 Nov 2016 14:51:02 -0500	[thread overview]
Message-ID: <20161114195102.GI21290@hungrycats.org> (raw)
In-Reply-To: <dfb1372d-8bff-93ff-1ba7-d636e6c7fe4d@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]

On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-14 13:22, James Pharaoh wrote:
> >One thing I am keen to understand is if BTRFS will automatically ignore
> >a request to deduplicate a file if it is already deduplicated? Given the
> >performance I see when doing a repeat deduplication, it seems to me that
> >it can't be doing so, although this could be caused by the CPU usage you
> >mention above.
> What's happening is that the dedupe ioctl does a byte-wise comparison of the
> ranges to make sure they're the same before linking them.  This is actually
> what takes most of the time when calling the ioctl, and is part of why it
> takes longer the larger the range to deduplicate is.  In essence, it's
> behaving like an OS should and not trusting userspace to make reasonable
> requests (which is also why there's a separate ioctl to clone a range from
> another file instead of deduplicating existing data).

Deduplicating an extent that may might be concurrently modified during the
dedup is a reasonable userspace request.  In the general case there's
no way for userspace to ensure that it's not happening.

That said, some optimization is possible (although there are good reasons
not to bother with optimization in the kernel):

	- VFS could recognize when it has two separate references to
	the same physical extent and not re-read the same data twice
	(but that requires teaching VFS how to do CoW in general, and is
	hard for political reasons on top of the obvious technical ones).

	- the extent-same ioctl could check to see which extents
	are referenced by the src and dst ranges, and return success
	immediately without reading data if they are the same (but
	userspace should already know this, or it's wasting a huge amount
	of time before it even calls the kernel).

> TBH, even though it's kind of annoying from a performance perspective, it's
> a rather nice safety net to have.  For example, one of the cases where I do
> deduplication is a couple of directories where each directory is an
> overlapping partial subset of one large tree which I keep elsewhere.  In
> this case, I can tell just by filename exactly what files might be
> duplicates, so the ioctl's check lets me just call the ioctl on all
> potential duplicates (after checking size, no point in wasting time if the
> files obviously aren't duplicates), and have it figure out whether or not
> they can be deduplicated.
> >
> >In any case, I'm considering some digging into the filesystem structures
> >to see if I can work this out myself before i do any deduplication. I'm
> >fairly sure this should be relatively simple to work out, at least well
> >enough for my purposes.
> Sadly, there's no way to avoid doing so right now.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2016-11-14 19:51 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
2016-11-07 14:02 ` David Sterba
2016-11-07 17:48   ` Mark Fasheh
2016-11-07 20:54     ` Adam Borowski
2016-11-08  2:17       ` Darrick J. Wong
2016-11-08 18:59         ` Mark Fasheh
2016-11-08 19:47           ` Darrick J. Wong
2016-11-08 19:47             ` [Ocfs2-devel] " Darrick J. Wong
2016-11-09 15:02       ` David Sterba
2016-11-08  2:40   ` Christoph Anton Mitterer
2016-11-08  6:11     ` James Pharaoh
2016-11-08 13:26     ` Austin S. Hemmelgarn
2016-11-08 16:57       ` Darrick J. Wong
2016-11-08 17:04         ` Austin S. Hemmelgarn
2016-11-08 18:49     ` Mark Fasheh
2016-11-07 17:59 ` Mark Fasheh
2016-11-07 18:49   ` James Pharaoh
2016-11-07 18:53     ` James Pharaoh
2016-11-14 18:07     ` Zygo Blaxell
2016-11-14 18:22       ` James Pharaoh
2016-11-14 18:39         ` Austin S. Hemmelgarn
2016-11-14 19:51           ` Zygo Blaxell [this message]
2016-11-14 19:56             ` Austin S. Hemmelgarn
2016-11-14 21:10               ` Zygo Blaxell
2016-11-15 12:26                 ` Austin S. Hemmelgarn
2016-11-15 17:52                   ` Zygo Blaxell
2016-11-16 22:24                     ` Niccolò Belli
2016-11-17  3:01                       ` Zygo Blaxell
2016-11-18 10:36                         ` Niccolò Belli
2016-11-14 20:07             ` James Pharaoh
2016-11-14 21:22               ` Zygo Blaxell
2016-11-14 18:43         ` Zygo Blaxell
2016-11-08 11:06 ` Niccolò Belli
2016-11-08 11:38   ` James Pharaoh
2016-11-08 16:57     ` Niccolò Belli
2016-11-08 16:58       ` James Pharaoh
2016-11-08 17:08         ` Niccolò Belli
2016-11-14 18:27   ` Zygo Blaxell
2016-11-08 22:36 ` Saint Germain
2016-11-09 11:24   ` Niccolò Belli
2016-11-09 12:47     ` Saint Germain
2016-11-13 12:45   ` James Pharaoh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161114195102.GI21290@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=ahferroin7@gmail.com \
    --cc=james@wellbehavedsoftware.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@versity.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.