From: Filipe Manana <fdmanana@kernel.org>
To: Glenn Washburn <development@efficientek.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs send/receive not always sharing extents
Date: Mon, 10 Oct 2022 10:42:18 +0100 [thread overview]
Message-ID: <20221010094218.GA2141122@falcondesktop> (raw)
In-Reply-To: <20221008005704.795b44b0@crass-HP-ZBook-15-G2>
On Sat, Oct 08, 2022 at 12:57:04AM -0500, Glenn Washburn wrote:
> I've got two reflinked files in a subvol that I'm sending/receiving to
> a different btrfs filesystem and they are not sharing extents on the
> receiving side. Other reflinked files in the same subvol are being
> reflinked on the receive side. The send side has a fairly old creation
> date if that matters. Attached is the receive log and a diff of
> filefrag's output for the files on the source volume to show that the
> two files (IMG_20200402_143055.dng and IMG_20200402_143055.dng.ref) are
> refinked on the source volume. This is a somewhat minimal example of
> what's happening on a big send that I'm doing that is failing because
> the receive side it too small to hold data when the reflinks are
> broken. Is this a bug? or what can I do to get send to see these files
> are reflinked?
send/receive only guarantees that the destination ends up with the same
data as the source.
It doesn't guarantee extents are always shared as in the source filesystem,
that the extent layout is the same, or holes are preserved for example.
There are two main reasons why extents don't often get cloned during
send/receive:
1) The extent is shared more than 64 times in the source filesystem.
We have this limitation because figuring out all inodes/roots that
share an extent can be expensive, and therefore massively slowdown
send operations.
2) Even when an extent is shared less than 64 times in the source
filesystem, we often don't clone the entirety of an extent and end up
issuing write operations for the remaining part(s). This is due to
algorithmic complexity as well, as identifying the best source for
cloning an extent can be expensive and considerably slowdown send
operations.
I have some work in progress and ideas to speedup send in some cases,
but I'm afraid we'll always have some limitations - in the best case
we can improve on them, but not eliminate them completely.
You can run a dedupe tool on the destination filesystem to get the
extents shared.
>
> Glenn
> --- /dev/fd/63 2022-10-08 00:31:46.783138591 -0500
> +++ /dev/fd/62 2022-10-08 00:31:46.787138126 -0500
> @@ -1,5 +1,5 @@
> Filesystem type is: 9123683e
> -File size of /media/test-btrfs/test/1.ro/IMG_20200402_143055.dng is 24674116 (6024 blocks of 4096 bytes)
> +File size of /media/test-btrfs/test/1.ro/IMG_20200402_143055.dng.ref is 24674116 (6024 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 6023: 1131665768..1131671791: 6024: last,shared,eof
> -/media/test-btrfs/test/1.ro/IMG_20200402_143055.dng: 1 extent found
> +/media/test-btrfs/test/1.ro/IMG_20200402_143055.dng.ref: 1 extent found
next prev parent reply other threads:[~2022-10-10 9:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-08 5:57 btrfs send/receive not always sharing extents Glenn Washburn
2022-10-10 9:42 ` Filipe Manana [this message]
2022-10-11 0:30 ` Paul Jones
2022-10-11 9:31 ` Filipe Manana
2022-10-11 20:49 ` Glenn Washburn
2022-10-12 10:03 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221010094218.GA2141122@falcondesktop \
--to=fdmanana@kernel.org \
--cc=development@efficientek.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox