linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* efficiency of btrfs cow
@ 2011-03-06 15:46 Brian J. Murrell
  2011-03-06 16:02 ` Fajar A. Nugraha
  2011-03-06 16:06 ` Calvin Walton
  0 siblings, 2 replies; 12+ messages in thread
From: Brian J. Murrell @ 2011-03-06 15:46 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2877 bytes --]

I have a backup volume on an ext4 filesystem that is using rsync and
it's --link-dest option to create "hard-linked incremental" backups.  I
am sure everyone here is familiar with the technique but in case anyone
isn't basically it's effectively doing (each backup):

# cp -al /backup/previous-backup/ /backup/current-backup
# rsync -aAHX ... --exclude /backup / /backup/current-backup

The shortcoming of this of course is that it just takes 1 byte in a
(possibly huge) file to require that the whole file be recopied to the
backup.

btrfs and it's CoW capability to the rescue -- again, no surprise to
anyone here.

So I replicated a few of the directories in my backup volume to a btrfs
volume using snapshots for each backup to take advantage of CoW and with
any luck, avoid entire file duplication where only some subset of the
file has changed.

Overall, it seems that I saw success.  Most backups on btrfs were
smaller than their source, and overall, for all of the backups
replicated, the use was less.  There were some however that were
significantly larger.  Here's the analysis:

  Backup      btrfs  ext4
  ------      -----  ----
monthly.22:  112GiB 113GiB  98%
monthly.21:   14GiB  14GiB  95%
monthly.20:   19GiB  20GiB  94%
monthly.19:   12GiB  13GiB  94%
monthly.18:    5GiB   6GiB  87%
monthly.17:   11GiB  12GiB  92%
monthly.16:    8GiB  10GiB  82%
monthly.15:   16GiB  11GiB 146%
monthly.14:   19GiB  20GiB  94%
monthly.13:   21GiB  22GiB  96%
monthly.12:   61GiB  67GiB  91%
monthly.11:   24GiB  22GiB 106%
monthly.10:   22GiB  19GiB 114%
 monthly.9:   12GiB  13GiB  90%
 monthly.8:   15GiB  17GiB  91%
 monthly.7:    9GiB  11GiB  87%
 monthly.6:    8GiB   9GiB  85%
 monthly.5:   16GiB  18GiB  91%
 monthly.4:   13GiB  15GiB  89%
 monthly.3:   11GiB  19GiB  62%
 monthly.2:   29GiB  22GiB 134%
 monthly.1:   23GiB  24GiB  94%
 monthly.0:    5GiB   5GiB  94%
     Total:  497GiB 512GiB  96%

btrfs use is a calculation of the "df" value of the fileystem before and
after each backup.  ext4 (rsync, really) use is calculated with "du
-xks" on the whole backup volume, which as you know only counts a
multiply hard-linked file's space use once.

So as you can see, for the most part, btrfs and CoW was more efficient,
but in some cases (i.e. monthly.15, monthly.11, monthly.10, monthly.2)
it was less efficient.

Taking the biggest anomaly, monthly.15, a du of just that directory on
both the btrfs and ext4 filesystems shows results I would expect:

btrfs: 136,876,580 monthly.15
ext4:  142,153,928 monthly.15

Yet the before and after "df" results show the btrfs usage higher than
ext4.  Is there some "periodic" jump in "overhead" used by btrfs that
would account for this mysterious increased usage in some of the copies?

Any other ideas for the anomalous results?

Cheers,
b.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 15:46 efficiency of btrfs cow Brian J. Murrell
@ 2011-03-06 16:02 ` Fajar A. Nugraha
  2011-03-06 16:11   ` Brian J. Murrell
                     ` (2 more replies)
  2011-03-06 16:06 ` Calvin Walton
  1 sibling, 3 replies; 12+ messages in thread
From: Fajar A. Nugraha @ 2011-03-06 16:02 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: linux-btrfs

On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell <brian@interlinx.bc.ca> wrote:
> # cp -al /backup/previous-backup/ /backup/current-backup
> # rsync -aAHX ... --exclude /backup / /backup/current-backup
>
> The shortcoming of this of course is that it just takes 1 byte in a
> (possibly huge) file to require that the whole file be recopied to the
> backup.

If you have snapshots anyway, why not :
- create a snapshot before each backup run
- use the same directory (e.g. just /backup), no need to "cp" anything
- add "--inplace" to rsync

-- 
Fajar

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 15:46 efficiency of btrfs cow Brian J. Murrell
  2011-03-06 16:02 ` Fajar A. Nugraha
@ 2011-03-06 16:06 ` Calvin Walton
  2011-03-06 16:17   ` Brian J. Murrell
  2011-03-23 12:39   ` Brian J. Murrell
  1 sibling, 2 replies; 12+ messages in thread
From: Calvin Walton @ 2011-03-06 16:06 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: linux-btrfs

On Sun, 2011-03-06 at 10:46 -0500, Brian J. Murrell wrote:
> I have a backup volume on an ext4 filesystem that is using rsync and
> it's --link-dest option to create "hard-linked incremental" backups.  I
> am sure everyone here is familiar with the technique but in case anyone
> isn't basically it's effectively doing (each backup):

> So I replicated a few of the directories in my backup volume to a btrfs
> volume using snapshots for each backup to take advantage of CoW and with
> any luck, avoid entire file duplication where only some subset of the
> file has changed.
> 
> Overall, it seems that I saw success.  Most backups on btrfs were
> smaller than their source, and overall, for all of the backups
> replicated, the use was less.  There were some however that were
> significantly larger.  Here's the analysis:

> Taking the biggest anomaly, monthly.15, a du of just that directory on
> both the btrfs and ext4 filesystems shows results I would expect:
> 
> btrfs: 136,876,580 monthly.15
> ext4:  142,153,928 monthly.15
> 
> Yet the before and after "df" results show the btrfs usage higher than
> ext4.  Is there some "periodic" jump in "overhead" used by btrfs that
> would account for this mysterious increased usage in some of the copies?

There actually is such a periodic jump in overhead, caused by the way
which btrfs dynamically allocates space for metadata as needed by the
creation of new files, which it does whenever the free metadata space
ratio reaches a threshold (it's probably more complicated than that, but
close enough for now).

To see exactly what's going on, you should use the "btrfs filesystem df"
command to see how space is being allocated for data and metadata
separately:

ayu ~ # btrfs fi df /
Data: total=266.01GB, used=249.35GB
System, DUP: total=8.00MB, used=36.00KB
Metadata, DUP: total=3.62GB, used=1.93GB
ayu ~ # df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda4             402G  254G  145G  64% /

If you use the btrfs tool's df command to account for space in your
testing, you should get much more accurate results.

-- 
Calvin Walton <calvin.walton@kepstin.ca>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 16:02 ` Fajar A. Nugraha
@ 2011-03-06 16:11   ` Brian J. Murrell
  2011-03-06 16:17   ` Calvin Walton
  2011-03-06 17:22   ` Freddie Cash
  2 siblings, 0 replies; 12+ messages in thread
From: Brian J. Murrell @ 2011-03-06 16:11 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 481 bytes --]

On 11-03-06 11:02 AM, Fajar A. Nugraha wrote:
> 
> If you have snapshots anyway, why not :
> - create a snapshot before each backup run
> - use the same directory (e.g. just /backup), no need to "cp" anything
> - add "--inplace" to rsync

Which is exactly what I am doing.  There is no "cp" involved in making
the btrfs copies of the existing backup.  It's simply "rsync -aAXH ...
--inplace" from the existing backup archive to the new, btrfs archive.

Cheers,
b.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 16:06 ` Calvin Walton
@ 2011-03-06 16:17   ` Brian J. Murrell
  2011-03-23 12:39   ` Brian J. Murrell
  1 sibling, 0 replies; 12+ messages in thread
From: Brian J. Murrell @ 2011-03-06 16:17 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

On 11-03-06 11:06 AM, Calvin Walton wrote:
> 
> There actually is such a periodic jump in overhead,

Ahh.  So my instincts were correct.

> caused by the way
> which btrfs dynamically allocates space for metadata as needed by the
> creation of new files, which it does whenever the free metadata space
> ratio reaches a threshold (it's probably more complicated than that, but
> close enough for now).

Sounds fair enough.

> To see exactly what's going on, you should use the "btrfs filesystem df"
> command to see how space is being allocated for data and metadata
> separately:
> 
> ayu ~ # btrfs fi df /
> Data: total=266.01GB, used=249.35GB
> System, DUP: total=8.00MB, used=36.00KB
> Metadata, DUP: total=3.62GB, used=1.93GB
> ayu ~ # df -h /
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda4             402G  254G  145G  64% /
> 
> If you use the btrfs tool's df command to account for space in your
> testing, you should get much more accurate results.

Indeed!  Unfortunately that tool seems to be completely silent on my system:

# btrfs filesystem df /mnt/btrfs-test/
# btrfs filesystem df /mnt/btrfs-test

Where /mnt/btrfs-test is where I have the device that I created the
btrfs filesystem on mounted.  i.e.:

# grep btrfs /proc/mounts
/dev/mapper/btrfs--test-btrfs--test /mnt/btrfs-test btrfs rw,relatime 0 0

My btrfs-tools appears to be from 20101101.  The changelog says:

  * Merging upstream version 0.19+20101101.

Cheers,
b.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 16:02 ` Fajar A. Nugraha
  2011-03-06 16:11   ` Brian J. Murrell
@ 2011-03-06 16:17   ` Calvin Walton
  2011-03-06 16:18     ` Brian J. Murrell
  2011-03-06 17:22   ` Freddie Cash
  2 siblings, 1 reply; 12+ messages in thread
From: Calvin Walton @ 2011-03-06 16:17 UTC (permalink / raw)
  To: Fajar A. Nugraha; +Cc: Brian J. Murrell, linux-btrfs

On Sun, 2011-03-06 at 23:02 +0700, Fajar A. Nugraha wrote:
> On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell <brian@interlinx.bc.ca> wrote:
> > # cp -al /backup/previous-backup/ /backup/current-backup
> > # rsync -aAHX ... --exclude /backup / /backup/current-backup
> >
> > The shortcoming of this of course is that it just takes 1 byte in a
> > (possibly huge) file to require that the whole file be recopied to the
> > backup.
> 
> If you have snapshots anyway, why not :
> - create a snapshot before each backup run
> - use the same directory (e.g. just /backup), no need to "cp" anything
> - add "--inplace" to rsync

To add a bit to this: if you *do not* use the --inplace option on rsync,
rsync will rewrite the entire file, instead of updating the existing
file!
This of course negates some of the benefits of btrfs's COW support when
doing incremental backups.

-- 
Calvin Walton <calvin.walton@kepstin.ca>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 16:17   ` Calvin Walton
@ 2011-03-06 16:18     ` Brian J. Murrell
  0 siblings, 0 replies; 12+ messages in thread
From: Brian J. Murrell @ 2011-03-06 16:18 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 482 bytes --]

On 11-03-06 11:17 AM, Calvin Walton wrote:
> 
> To add a bit to this: if you *do not* use the --inplace option on rsync,
> rsync will rewrite the entire file, instead of updating the existing
> file!

Of course.  As I mentioned to Fajar previously, I am indeed using
--inplace when copying from the existing archive to the new btrfs archive.

> This of course negates some of the benefits of btrfs's COW support when
> doing incremental backups.

Absolutely.

b.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 16:02 ` Fajar A. Nugraha
  2011-03-06 16:11   ` Brian J. Murrell
  2011-03-06 16:17   ` Calvin Walton
@ 2011-03-06 17:22   ` Freddie Cash
  2 siblings, 0 replies; 12+ messages in thread
From: Freddie Cash @ 2011-03-06 17:22 UTC (permalink / raw)
  To: Fajar A. Nugraha; +Cc: Brian J. Murrell, linux-btrfs

On Sun, Mar 6, 2011 at 8:02 AM, Fajar A. Nugraha <list@fajar.net> wrote:
> On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell <brian@interlinx.bc.ca> wrote:
>> # cp -al /backup/previous-backup/ /backup/current-backup
>> # rsync -aAHX ... --exclude /backup / /backup/current-backup
>>
>> The shortcoming of this of course is that it just takes 1 byte in a
>> (possibly huge) file to require that the whole file be recopied to the
>> backup.
>
> If you have snapshots anyway, why not :
> - create a snapshot before each backup run
> - use the same directory (e.g. just /backup), no need to "cp" anything
> - add "--inplace" to rsync

You may also want to test with/without --no-whole-file as well.
That's most useful when the two filesystems are on the same system and
should reduce the amount of data copied around, as it forces rsync to
only use file deltas.  This is very much a win on ZFS, which is also
CoW, so it should be a win on Btrfs.


-- 
Freddie Cash
fjwcash@gmail.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-06 16:06 ` Calvin Walton
  2011-03-06 16:17   ` Brian J. Murrell
@ 2011-03-23 12:39   ` Brian J. Murrell
  2011-03-23 15:53     ` Chester
  2011-03-23 17:36     ` Kolja Dummann
  1 sibling, 2 replies; 12+ messages in thread
From: Brian J. Murrell @ 2011-03-23 12:39 UTC (permalink / raw)
  To: Calvin Walton; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1782 bytes --]

On 11-03-06 11:06 AM, Calvin Walton wrote:
> 
> To see exactly what's going on, you should use the "btrfs filesystem df"
> command to see how space is being allocated for data and metadata
> separately:

OK.  So with an empty filesystem, before my first copy (i.e. the base on
which the next copy will CoW from) df reports:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/btrfs--test-btrfs--test
                     922746880        56 922746824   1% /mnt/btrfs-test

and btrfs fi df reports:

Data: total=8.00MB, used=0.00
Metadata: total=1.01GB, used=24.00KB
System: total=12.00MB, used=4.00KB

after the first copy df and btrfs fi df report:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/btrfs--test-btrfs--test
                     922746880 121402328 801344552  14% /mnt/btrfs-test

root@linux:/mnt/btrfs-test# cat .snapshots/monthly.22/metadata/btrfs_df-stop
Data: total=110.01GB, used=109.26GB
Metadata: total=5.01GB, used=3.26GB
System: total=12.00MB, used=24.00KB

So it's clear that total usage (as reported by df) was 121,402,328KB but
Metadata has two values:

Metadata: total=5.01GB, used=3.26GB

What's the difference between total and used?  And for that matter,
what's the difference between the total and used for Data
(total=110.01GB, used=109.26GB)?

Even if I take the largest values (i.e. the total values) for Data and
Metadata (each converted to KB first) and add them up they are:
120,607,211.52 which is not quite the 121,402,328 that df reports.
There is a 795,116.48KB discrepancy.

In any case, which value from a btrfs df fi should I be subtracting from
df's accounting to get a real accounting of the amount of data used?

Cheers,
b.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-23 12:39   ` Brian J. Murrell
@ 2011-03-23 15:53     ` Chester
  2011-03-23 16:19       ` Brian J. Murrell
  2011-03-23 17:36     ` Kolja Dummann
  1 sibling, 1 reply; 12+ messages in thread
From: Chester @ 2011-03-23 15:53 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: Calvin Walton, linux-btrfs

I'm not a developer, but I think it goes something like this:
btrfs doesn't write the filesystem on the entire device/partition at
format time, rather, it dynamically increases the size of the
filesystem as data is used. That's why formating a disk in btrfs can
be so fast.

On Wed, Mar 23, 2011 at 12:39 PM, Brian J. Murrell
<brian@interlinx.bc.ca> wrote:
>
> On 11-03-06 11:06 AM, Calvin Walton wrote:
> >
> > To see exactly what's going on, you should use the "btrfs filesyste=
m df"
> > command to see how space is being allocated for data and metadata
> > separately:
>
> OK. =A0So with an empty filesystem, before my first copy (i.e. the ba=
se on
> which the next copy will CoW from) df reports:
>
> Filesystem =A0 =A0 =A0 =A0 =A0 1K-blocks =A0 =A0 =A0Used Available Us=
e% Mounted on
> /dev/mapper/btrfs--test-btrfs--test
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 922746880 =A0 =A0 =A0 =A056 9=
22746824 =A0 1% /mnt/btrfs-test
>
> and btrfs fi df reports:
>
> Data: total=3D8.00MB, used=3D0.00
> Metadata: total=3D1.01GB, used=3D24.00KB
> System: total=3D12.00MB, used=3D4.00KB
>
> after the first copy df and btrfs fi df report:
>
> Filesystem =A0 =A0 =A0 =A0 =A0 1K-blocks =A0 =A0 =A0Used Available Us=
e% Mounted on
> /dev/mapper/btrfs--test-btrfs--test
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 922746880 121402328 801344552=
 =A014% /mnt/btrfs-test
>
> root@linux:/mnt/btrfs-test# cat .snapshots/monthly.22/metadata/btrfs_=
df-stop
> Data: total=3D110.01GB, used=3D109.26GB
> Metadata: total=3D5.01GB, used=3D3.26GB
> System: total=3D12.00MB, used=3D24.00KB
>
> So it's clear that total usage (as reported by df) was 121,402,328KB =
but
> Metadata has two values:
>
> Metadata: total=3D5.01GB, used=3D3.26GB
>
> What's the difference between total and used? =A0And for that matter,
> what's the difference between the total and used for Data
> (total=3D110.01GB, used=3D109.26GB)?
>
> Even if I take the largest values (i.e. the total values) for Data an=
d
> Metadata (each converted to KB first) and add them up they are:
> 120,607,211.52 which is not quite the 121,402,328 that df reports.
> There is a 795,116.48KB discrepancy.
>
> In any case, which value from a btrfs df fi should I be subtracting f=
rom
> df's accounting to get a real accounting of the amount of data used?
>
> Cheers,
> b.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-23 15:53     ` Chester
@ 2011-03-23 16:19       ` Brian J. Murrell
  0 siblings, 0 replies; 12+ messages in thread
From: Brian J. Murrell @ 2011-03-23 16:19 UTC (permalink / raw)
  To: Chester; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 522 bytes --]

On 11-03-23 11:53 AM, Chester wrote:
> I'm not a developer, but I think it goes something like this:
> btrfs doesn't write the filesystem on the entire device/partition at
> format time, rather, it dynamically increases the size of the
> filesystem as data is used. That's why formating a disk in btrfs can
> be so fast.

Indeed, this much is understood, which is why I am using btrfs fi df to
try to determine how much of the increase in raw device usage is the
dynamic allocation of metadata.

Cheers,
b.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: efficiency of btrfs cow
  2011-03-23 12:39   ` Brian J. Murrell
  2011-03-23 15:53     ` Chester
@ 2011-03-23 17:36     ` Kolja Dummann
  1 sibling, 0 replies; 12+ messages in thread
From: Kolja Dummann @ 2011-03-23 17:36 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: Calvin Walton, linux-btrfs

> So it's clear that total usage (as reported by df) was 121,402,328KB =
but
> Metadata has two values:
>
> Metadata: total=3D5.01GB, used=3D3.26GB
>
> What's the difference between total and used? =C2=A0And for that matt=
er,
> what's the difference between the total and used for Data
> (total=3D110.01GB, used=3D109.26GB)?
>

total is the space allocated (reserved) for a kind usage (metadata or
data) the space allocated for a kind of usage can't be used for
something else. The used value is the space that is used from the
space that has been allocated for a kind of usage.

The wiki gives you a overview how to interpret the values:

https://btrfs.wiki.kernel.org/index.php/FAQ#btrfs_filesystem_df_.2Fmoun=
tpoint

cheers Kolja.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-03-23 17:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-06 15:46 efficiency of btrfs cow Brian J. Murrell
2011-03-06 16:02 ` Fajar A. Nugraha
2011-03-06 16:11   ` Brian J. Murrell
2011-03-06 16:17   ` Calvin Walton
2011-03-06 16:18     ` Brian J. Murrell
2011-03-06 17:22   ` Freddie Cash
2011-03-06 16:06 ` Calvin Walton
2011-03-06 16:17   ` Brian J. Murrell
2011-03-23 12:39   ` Brian J. Murrell
2011-03-23 15:53     ` Chester
2011-03-23 16:19       ` Brian J. Murrell
2011-03-23 17:36     ` Kolja Dummann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).