From: "Brian J. Murrell" <brian@interlinx.bc.ca>
To: linux-btrfs@vger.kernel.org
Subject: efficiency of btrfs cow
Date: Sun, 06 Mar 2011 10:46:20 -0500 [thread overview]
Message-ID: <il0a8d$4ev$1@dough.gmane.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 2877 bytes --]
I have a backup volume on an ext4 filesystem that is using rsync and
it's --link-dest option to create "hard-linked incremental" backups. I
am sure everyone here is familiar with the technique but in case anyone
isn't basically it's effectively doing (each backup):
# cp -al /backup/previous-backup/ /backup/current-backup
# rsync -aAHX ... --exclude /backup / /backup/current-backup
The shortcoming of this of course is that it just takes 1 byte in a
(possibly huge) file to require that the whole file be recopied to the
backup.
btrfs and it's CoW capability to the rescue -- again, no surprise to
anyone here.
So I replicated a few of the directories in my backup volume to a btrfs
volume using snapshots for each backup to take advantage of CoW and with
any luck, avoid entire file duplication where only some subset of the
file has changed.
Overall, it seems that I saw success. Most backups on btrfs were
smaller than their source, and overall, for all of the backups
replicated, the use was less. There were some however that were
significantly larger. Here's the analysis:
Backup btrfs ext4
------ ----- ----
monthly.22: 112GiB 113GiB 98%
monthly.21: 14GiB 14GiB 95%
monthly.20: 19GiB 20GiB 94%
monthly.19: 12GiB 13GiB 94%
monthly.18: 5GiB 6GiB 87%
monthly.17: 11GiB 12GiB 92%
monthly.16: 8GiB 10GiB 82%
monthly.15: 16GiB 11GiB 146%
monthly.14: 19GiB 20GiB 94%
monthly.13: 21GiB 22GiB 96%
monthly.12: 61GiB 67GiB 91%
monthly.11: 24GiB 22GiB 106%
monthly.10: 22GiB 19GiB 114%
monthly.9: 12GiB 13GiB 90%
monthly.8: 15GiB 17GiB 91%
monthly.7: 9GiB 11GiB 87%
monthly.6: 8GiB 9GiB 85%
monthly.5: 16GiB 18GiB 91%
monthly.4: 13GiB 15GiB 89%
monthly.3: 11GiB 19GiB 62%
monthly.2: 29GiB 22GiB 134%
monthly.1: 23GiB 24GiB 94%
monthly.0: 5GiB 5GiB 94%
Total: 497GiB 512GiB 96%
btrfs use is a calculation of the "df" value of the fileystem before and
after each backup. ext4 (rsync, really) use is calculated with "du
-xks" on the whole backup volume, which as you know only counts a
multiply hard-linked file's space use once.
So as you can see, for the most part, btrfs and CoW was more efficient,
but in some cases (i.e. monthly.15, monthly.11, monthly.10, monthly.2)
it was less efficient.
Taking the biggest anomaly, monthly.15, a du of just that directory on
both the btrfs and ext4 filesystems shows results I would expect:
btrfs: 136,876,580 monthly.15
ext4: 142,153,928 monthly.15
Yet the before and after "df" results show the btrfs usage higher than
ext4. Is there some "periodic" jump in "overhead" used by btrfs that
would account for this mysterious increased usage in some of the copies?
Any other ideas for the anomalous results?
Cheers,
b.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next reply other threads:[~2011-03-06 15:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-06 15:46 Brian J. Murrell [this message]
2011-03-06 16:02 ` efficiency of btrfs cow Fajar A. Nugraha
2011-03-06 16:11 ` Brian J. Murrell
2011-03-06 16:17 ` Calvin Walton
2011-03-06 16:18 ` Brian J. Murrell
2011-03-06 17:22 ` Freddie Cash
2011-03-06 16:06 ` Calvin Walton
2011-03-06 16:17 ` Brian J. Murrell
2011-03-23 12:39 ` Brian J. Murrell
2011-03-23 15:53 ` Chester
2011-03-23 16:19 ` Brian J. Murrell
2011-03-23 17:36 ` Kolja Dummann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='il0a8d$4ev$1@dough.gmane.org' \
--to=brian@interlinx.bc.ca \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).