* Why so much "btrfs send" data for "cp -a --reflink"?
@ 2020-10-05 7:54 Torsten Bronger
2020-10-06 12:48 ` Torsten Bronger
0 siblings, 1 reply; 4+ messages in thread
From: Torsten Bronger @ 2020-10-05 7:54 UTC (permalink / raw)
To: linux-btrfs
Hallöchen!
I have two subvolumes A and B. „A“ contains 50GB data, B is empty.
None is a snapshot of the other. Now, I copy all data from A to B
with "cp -a --reflink A/* B". This copying takes less than a
second. So apparently, no bulk data was duplicated. "diff -rq A B"
is empty. So far, so good.
However, it surprises me that
btrfs send -p A B | wc -c
reports 12GB. I would have hoped for very few data (say, a couple
of MBs). Or, the whole 50GB (because A is not a real parent of B,
and never has been). An additional "-c A" does not change anything.
Why is this? In other words, what comprises those 12GB?
It may be insignificant, but the 50GB are almost fully a single
VirtualBox .vdi file, somewhat fragmented (filefrag says 16000).
Regards,
Torsten.
--
Torsten Bronger
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Why so much "btrfs send" data for "cp -a --reflink"?
2020-10-05 7:54 Why so much "btrfs send" data for "cp -a --reflink"? Torsten Bronger
@ 2020-10-06 12:48 ` Torsten Bronger
2020-10-06 13:34 ` Filipe Manana
0 siblings, 1 reply; 4+ messages in thread
From: Torsten Bronger @ 2020-10-06 12:48 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 723 bytes --]
Hallöchen!
Torsten Bronger writes:
> I have two subvolumes A and B. „A“ contains 50GB data, B is empty.
> None is a snapshot of the other. Now, I copy all data from A to B
> with "cp -a --reflink A/* B". This copying takes less than a
> second. So apparently, no bulk data was duplicated. "diff -rq A B"
> is empty. So far, so good.
>
> However, it surprises me that
>
> btrfs send -p A B | wc -c
>
> reports 12GB.
At <https://wilson.bronger.org/btrfs-receive.dump.xz>, I have
uploaded the output of "btrfs receive --dump". Apparently, there
are many "write". Why doesn’t btrfs detect that all those extents
are present in the parent?
Regards,
Torsten.
--
Torsten Bronger
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4913 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Why so much "btrfs send" data for "cp -a --reflink"?
2020-10-06 12:48 ` Torsten Bronger
@ 2020-10-06 13:34 ` Filipe Manana
2020-10-06 15:39 ` Torsten Bronger
0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2020-10-06 13:34 UTC (permalink / raw)
To: linux-btrfs
On Tue, Oct 6, 2020 at 1:57 PM Torsten Bronger
<bronger@physik.rwth-aachen.de> wrote:
>
> Hallöchen!
>
> Torsten Bronger writes:
>
> > I have two subvolumes A and B. „A“ contains 50GB data, B is empty.
> > None is a snapshot of the other. Now, I copy all data from A to B
> > with "cp -a --reflink A/* B". This copying takes less than a
> > second. So apparently, no bulk data was duplicated. "diff -rq A B"
> > is empty. So far, so good.
> >
> > However, it surprises me that
> >
> > btrfs send -p A B | wc -c
> >
> > reports 12GB.
>
> At <https://wilson.bronger.org/btrfs-receive.dump.xz>, I have
> uploaded the output of "btrfs receive --dump". Apparently, there
> are many "write". Why doesn’t btrfs detect that all those extents
> are present in the parent?
I haven't looked at the dump, and don't have the time to do so right now.
But since the files are VM images, very likely what you are seeing are
writes full of zero bytes.
This is because the current send protocol does not have support holes,
instead it issues write operations with a bunch of zeros.
And holes are very common in VM images in general.
Just look at a few write commands, grab the file name, offset and
length, then check if after you read the corresponding file range you
get only zeros.
If so, then it has nothing to do with deduplication/reflinks, just the
lack of support for hole punching commands in the send stream.
Cheers.
>
> Regards,
> Torsten.
>
> --
> Torsten Bronger
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Why so much "btrfs send" data for "cp -a --reflink"?
2020-10-06 13:34 ` Filipe Manana
@ 2020-10-06 15:39 ` Torsten Bronger
0 siblings, 0 replies; 4+ messages in thread
From: Torsten Bronger @ 2020-10-06 15:39 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 879 bytes --]
Hallöchen!
Filipe Manana writes:
> [...]
>
> But since the files are VM images, very likely what you are seeing
> are writes full of zero bytes. This is because the current send
> protocol does not have support holes, instead it issues write
> operations with a bunch of zeros. And holes are very common in VM
> images in general.
Thank you for your answer! It is plausible: I created a 10G file
with random content, reflink-copied it, and the "btrfs send" is
almost nothing.
> Just look at a few write commands, grab the file name, offset and
> length, then check if after you read the corresponding file range
> you get only zeros.
I haven’t tested it, but the "btrfs send" produces 12G, 1.5G of
which are zeroes, and the source file contains 6.8GB of zeroes. Can
this be the holes nevertheless?
Regards,
Torsten.
--
Torsten Bronger
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4913 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-10-06 15:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-05 7:54 Why so much "btrfs send" data for "cp -a --reflink"? Torsten Bronger
2020-10-06 12:48 ` Torsten Bronger
2020-10-06 13:34 ` Filipe Manana
2020-10-06 15:39 ` Torsten Bronger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).