Cloning a Btrfs partition

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Cloning a Btrfs partition
       [not found] <bb747e0c-d6d8-4f60-a3f6-cf64c856515e@mail.placs.net>
@ 2011-12-07 18:35 ` BJ Quinn
  2011-12-07 18:39   ` Freddie Cash
  2011-12-08 10:00   ` Stephane CHAZELAS
  0 siblings, 2 replies; 24+ messages in thread
From: BJ Quinn @ 2011-12-07 18:35 UTC (permalink / raw)
  To: linux-btrfs

I've got a 6TB btrfs array (two 3TB drives in a RAID 0). It's about 2/3 full and has lots of snapshots. I've written a script that runs through the snapshots and copies the data efficiently (rsync --inplace --no-whole-file) from the main 6TB array to a backup array, creating snapshots on the backup array and then continuing on copying the next snapshot. Problem is, it looks like it will take weeks to finish. 

I've tried simply using dd to clone the btrfs partition, which technically appears to work, but then it appears that the UUID between the arrays is identical, so I can only mount one or the other. This means I can't continue to simply update the backup array with the new snapshots created on the main array (my script is capable of "catching up" the backup array with the new snapshots, but if I can't mount both arrays...). 

Any suggestions? 

-BJ Quinn 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-07 18:35 ` Cloning a Btrfs partition BJ Quinn
@ 2011-12-07 18:39   ` Freddie Cash
  2011-12-07 18:49     ` BJ Quinn
  2011-12-08 10:00   ` Stephane CHAZELAS
  1 sibling, 1 reply; 24+ messages in thread
From: Freddie Cash @ 2011-12-07 18:39 UTC (permalink / raw)
  To: BJ Quinn; +Cc: linux-btrfs

On Wed, Dec 7, 2011 at 10:35 AM, BJ Quinn <bj@placs.net> wrote:
> I've got a 6TB btrfs array (two 3TB drives in a RAID 0). It's about 2/3 full and has lots of snapshots. I've written a script that runs through the snapshots and copies the data efficiently (rsync --inplace --no-whole-file) from the main 6TB array to a backup array, creating snapshots on the backup array and then continuing on copying the next snapshot. Problem is, it looks like it will take weeks to finish.
>
> I've tried simply using dd to clone the btrfs partition, which technically appears to work, but then it appears that the UUID between the arrays is identical, so I can only mount one or the other. This means I can't continue to simply update the backup array with the new snapshots created on the main array (my script is capable of "catching up" the backup array with the new snapshots, but if I can't mount both arrays...).
>
> Any suggestions?

Until an analog of "zfs send" is added to btrfs (and I believe there
are some side projects ongoing to add something similar), your only
option is the one you are currently using via rsync.

--
Freddie Cash
fjwcash@gmail.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-07 18:39   ` Freddie Cash
@ 2011-12-07 18:49     ` BJ Quinn
  2011-12-08 15:49       ` Phillip Susi
  0 siblings, 1 reply; 24+ messages in thread
From: BJ Quinn @ 2011-12-07 18:49 UTC (permalink / raw)
  To: Freddie Cash; +Cc: linux-btrfs

>Until an analog of "zfs send" is added to btrfs (and I believe there 
>are some side projects ongoing to add something similar), your only 
>option is the one you are currently using via rsync. 

Well, I don't mind using the rsync script, it's just that it's so slow. I'd love to use my script to "keep up" the backup array, which only takes a couple of hours and is acceptable. But starting with a blank backup array, it takes weeks to get the backup array caught up, which isn't realistically possible. 

What I need isn't really an equivalent "zfs send" -- my script can do that. As I remember, zfs send was pretty slow too in a scenario like this. What I need is to be able to clone a btrfs array somehow -- dd would be nice, but as I said I end up with the identical UUID problem. Is there a way to change the UUID of an array?

-BJ Quinn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-07 18:49     ` BJ Quinn
@ 2011-12-08 15:49       ` Phillip Susi
  2011-12-08 16:07         ` BJ Quinn
  2011-12-08 16:27         ` Stephane CHAZELAS
  0 siblings, 2 replies; 24+ messages in thread
From: Phillip Susi @ 2011-12-08 15:49 UTC (permalink / raw)
  To: BJ Quinn; +Cc: Freddie Cash, linux-btrfs

On 12/7/2011 1:49 PM, BJ Quinn wrote:
> What I need isn't really an equivalent "zfs send" -- my script can do
> that. As I remember, zfs send was pretty slow too in a scenario like
> this. What I need is to be able to clone a btrfs array somehow -- dd
> would be nice, but as I said I end up with the identical UUID
> problem. Is there a way to change the UUID of an array?

No, btrfs send is exactly what you need.  Using dd is slow because it 
copies unused blocks, and requires the source fs be unmounted and the 
destination be an empty partition.  rsync is slow because it can't take 
advantage of the btrfs tree to quickly locate the files (or parts of 
them) that have changed.  A btrfs send would solve all of these issues.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 15:49       ` Phillip Susi
@ 2011-12-08 16:07         ` BJ Quinn
  2011-12-08 16:09           ` Jan Schmidt
  2011-12-08 16:27         ` Stephane CHAZELAS
  1 sibling, 1 reply; 24+ messages in thread
From: BJ Quinn @ 2011-12-08 16:07 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Freddie Cash, linux-btrfs

>No, btrfs send is exactly what you need. Using dd is slow because it 
>copies unused blocks, and requires the source fs be unmounted and the 
>destination be an empty partition. rsync is slow because it can't take 
>advantage of the btrfs tree to quickly locate the files (or parts of 
>them) that have changed. A btrfs send would solve all of these issues. 

Well, that depends.  Using dd is slow if you have a large percentage of the drive unused.  In my case, half or more of the drive is in use, and dd is about as efficient as is theoretically possible on the part of the drive that is in use.  You're right that it requires the drive to be unmounted and the destination to be an empty partition, but what I want to use dd for is to catch an empty drive up to being current and afterwards I'll use my rsync script to keep it up to date with the latest snapshots.  Maybe btrfs send will be more efficient, but in my experience with zfs send, dd was 10x faster unless your drive was nearly empty.

At any rate, was someone saying that some work had already started on something like btrfs send?

Or, alternatively, given that dd would be sufficient for my needs, is there any way to change the UUID of a btrfs partition after I've cloned it?

-BJ

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 16:07         ` BJ Quinn
@ 2011-12-08 16:09           ` Jan Schmidt
  2011-12-08 16:28             ` BJ Quinn
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schmidt @ 2011-12-08 16:09 UTC (permalink / raw)
  To: BJ Quinn; +Cc: Phillip Susi, Freddie Cash, linux-btrfs

On 08.12.2011 17:07, BJ Quinn wrote:
> At any rate, was someone saying that some work had already started on something like btrfs send?

That's right.

-Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 16:09           ` Jan Schmidt
@ 2011-12-08 16:28             ` BJ Quinn
  2011-12-08 16:41               ` Jan Schmidt
  0 siblings, 1 reply; 24+ messages in thread
From: BJ Quinn @ 2011-12-08 16:28 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: Phillip Susi, Freddie Cash, linux-btrfs

>> At any rate, was someone saying that some work had already started on something like btrfs send? 

>That's right. 

Google tells me that someone is you.  :)

What Google wouldn't tell me though was whether you have something I could test?

-BJ

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 16:28             ` BJ Quinn
@ 2011-12-08 16:41               ` Jan Schmidt
  2011-12-08 19:56                 ` BJ Quinn
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schmidt @ 2011-12-08 16:41 UTC (permalink / raw)
  To: BJ Quinn; +Cc: Phillip Susi, Freddie Cash, linux-btrfs

On 08.12.2011 17:28, BJ Quinn wrote:
>>> At any rate, was someone saying that some work had already started on something like btrfs send? 
> 
>> That's right. 
> 
> Google tells me that someone is you.  :)
> 
> What Google wouldn't tell me though was whether you have something I could test?

Well, it's telling you the right thing :-)

Currently I'm distracted by reliable backref walking, which turned out
to be a prerequisite of btrfs send. Once I have that thing done, direct
work on the send/receive functionality will continue.

As soon as there's something that can be tested, you'll find it on this
list.

-Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 16:41               ` Jan Schmidt
@ 2011-12-08 19:56                 ` BJ Quinn
  2011-12-08 20:05                   ` Chris Mason
  0 siblings, 1 reply; 24+ messages in thread
From: BJ Quinn @ 2011-12-08 19:56 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: Phillip Susi, Freddie Cash, linux-btrfs

>As soon as there's something that can be tested, you'll find it on this list. 

Great, I'd love to try it.  I spent a lot of time with ZFS and the zfs send/recv functionality was very convenient.

Meanwhile, does anyone know how I can change the UUID of a btrfs partition or are there any other suggestions?

-BJ

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 19:56                 ` BJ Quinn
@ 2011-12-08 20:05                   ` Chris Mason
  2011-12-08 20:38                     ` BJ Quinn
  2011-12-12 21:41                     ` BJ Quinn
  0 siblings, 2 replies; 24+ messages in thread
From: Chris Mason @ 2011-12-08 20:05 UTC (permalink / raw)
  To: BJ Quinn; +Cc: Jan Schmidt, Phillip Susi, Freddie Cash, linux-btrfs

On Thu, Dec 08, 2011 at 01:56:59PM -0600, BJ Quinn wrote:
> >As soon as there's something that can be tested, you'll find it on this list. 
> 
> Great, I'd love to try it.  I spent a lot of time with ZFS and the zfs send/recv functionality was very convenient.
> 
> Meanwhile, does anyone know how I can change the UUID of a btrfs partition or are there any other suggestions?

You can't change the uuid of an existing btrfs partition.  Well, you
can, but you have to rewrite all the metadata blocks.

The performance problem you're hitting is probably from metadata seeks
all over the place.  Jeff Liu has a new snapshot diffing tool in
development that may make for less IO from rsync.

Care to share you rsync script?

-chris

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 20:05                   ` Chris Mason
@ 2011-12-08 20:38                     ` BJ Quinn
  2011-12-12 21:41                     ` BJ Quinn
  1 sibling, 0 replies; 24+ messages in thread
From: BJ Quinn @ 2011-12-08 20:38 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jan Schmidt, Phillip Susi, Freddie Cash, linux-btrfs

> Care to share you rsync script? 

Sure.  It's a little raw, and makes some assumptions about my environment, but it does the job other than the fact that it takes weeks to run.  :)

In the below example, the "main" or source FS is mounted at "/mnt/btrfs", the "backup" or target FS at "/mnt/btrfsbackup", and this script is located at "/mnt/btrfs/backupscripts".  I also run a slight variation of this script that makes it only "catch up" on snapshots that don't already exist on the backup FS simply by commenting out the rsync command directly above the echo that says "Export resynced snapshot".  If you don't comment that out, then the script attempts to re-sync ALL snapshots, even ones that have already been copied over in a previous run, effectively double checking that everything has already been copied.

This script is as efficient as I know how to make it, as it uses the --no-whole-file and --inplace rsync switches to prevent btrfs from thinking lots of blocks have changed that haven't really changed and eating up lots of space.

Also, an assumption is made that directly under /mnt/btrfs there are many subvolumes, and that the snapshots for these subvolumes are stored under /mnt/btrfs/snapshots/[subvol name]/[snap name].

Lastly, please note that I *DO* understand the purpose of the --bwlimit switch for rsync.  I've run it without that and it still takes weeks.  It's only in there now because it seemed to prevent issues I was having where the whole system would lock up under heavy btrfs activity.  I can't remember if that was a problem I solved by switching out my SATA controller card or by upgrading my kernel, but I don't believe I'm having that issue anymore.

FWIW.

#!/bin/bash

# The following script is for exporting snapshots from one drive to another.
#  Putting the word "STOP" (all caps, without quotes) in stopexport.txt will abort
#  the process at the end of the current rsync job.

DATE=`date +%Y%m%d`
DATETIME=`date +%Y%m%d%H%M%S`
SCRIPTSFOLDER="/mnt/btrfs/backupscripts"
BACKUPFOLDER="/mnt/btrfs"
EXTERNALDRIVE="/mnt/btrfsbackup"

echo "Export Started `date`"
echo

# This will create all the snapshots in the original drive's snapshots folder
#  on the export drive that don't exist on the export drive.
for PATHNAME in $BACKUPFOLDER/snapshots/*
do
  if [ `cat $SCRIPTSFOLDER/stopexport.txt` = "STOP" ]; then
    echo "STOP"
    break
  fi
  SHARENAME=`basename $PATHNAME`
  btrfs subvolume create $EXTERNALDRIVE/$SHARENAME
  for SNAPPATH in $PATHNAME/*
  do
    echo $SNAPPATH
    SNAPNAME=`basename $SNAPPATH`
    if [ ! -d "$EXTERNALDRIVE/snapshots/$SHARENAME/$SNAPNAME" ]; then
      rsync -avvP --delete --bwlimit=20000 --ignore-errors --no-whole-file --inplace $SNAPPATH/ $EXTERNALDRIVE/$SHARENAME
      mkdir -p $EXTERNALDRIVE/snapshots/$SHARENAME
      btrfs subvolume snapshot $EXTERNALDRIVE/$SHARENAME $EXTERNALDRIVE/snapshots/$SHARENAME/$SNAPNAME
      echo "Export created snapshot $EXTERNALDRIVE/snapshots/$SHARENAME/$SNAPNAME"
    else
      rsync -avvP --delete --bwlimit=20000 --ignore-errors --no-whole-file --inplace $SNAPPATH/ $EXTERNALDRIVE/snapshots/$SHARENAME/$SNAPNAME
      echo "Export resynced snapshot $EXTERNALDRIVE/snapshots/$SHARENAME/$SNAPNAME"
    fi
    if [ `cat $SCRIPTSFOLDER/stopexport.txt` = "STOP" ]; then
     echo "STOP"
     break
    fi
  done;
done;

echo "Export Completed `date`"

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 20:05                   ` Chris Mason
  2011-12-08 20:38                     ` BJ Quinn
@ 2011-12-12 21:41                     ` BJ Quinn
  2011-12-13 22:06                       ` Goffredo Baroncelli
  2011-12-30  0:25                       ` BJ Quinn
  1 sibling, 2 replies; 24+ messages in thread
From: BJ Quinn @ 2011-12-12 21:41 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jan Schmidt, Phillip Susi, Freddie Cash, linux-btrfs

>You can't change the uuid of an existing btrfs partition. Well, you 
>can, but you have to rewrite all the metadata blocks. 

Is there a tool that would allow me to rewrite all the metadata blocks with a new UUID?  At this point, it can't possibly take longer than the way I'm trying to do it now...

Someone once said "Resetting the UUID on btrfs isn't a quick-and-easy thing - you have to walk the entire tree and change every object. We've got a bad-hack in meego that uses btrfs-debug-tree and changes the UUID while it runs the entire tree, but it's ugly as hell."

Ok, I'll take the bad-hack.  How would I actually go about using said bad-hack?

-BJ

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-12 21:41                     ` BJ Quinn
@ 2011-12-13 22:06                       ` Goffredo Baroncelli
  2011-12-30  0:25                       ` BJ Quinn
  1 sibling, 0 replies; 24+ messages in thread
From: Goffredo Baroncelli @ 2011-12-13 22:06 UTC (permalink / raw)
  To: BJ Quinn, linux-btrfs

On Monday, 12 December, 2011 15:41:29 you wrote:
> >You can't change the uuid of an existing btrfs partition. Well, you
> >can, but you have to rewrite all the metadata blocks.
> 
> Is there a tool that would allow me to rewrite all the metadata blocks with
> a new UUID?  At this point, it can't possibly take longer than the way I'm
> trying to do it now...
> 
> Someone once said "Resetting the UUID on btrfs isn't a quick-and-easy thing
> - you have to walk the entire tree and change every object. We've got a
> bad-hack in meego that uses btrfs-debug-tree and changes the UUID while it
> runs the entire tree, but it's ugly as hell."

I am looking for that. btrfs-debug-tree is capable to dump every leaf and 
every node logical address.

To change the UUID of a btrfs filesystem

On every leaf/node we should
- update the FSID						(a)
- update the chunk_uuid [*]
- update the checksum

for the "dev_item" items we should update the
- device UUID							(b)
- FSID									(see 'a')

for the "chunk_item" items we should update the
- device UUID of every stripe			(b)

for every superblock (three for device), we should update:
- FSID									(see 'a')
- device uuid							(see 'b')
- for every "system chunk" items contained in the superblock we should update:
	- device UUID of every stripe			(b)
- update the checksum

The most complex part is to map the logical address to the physical device. 

In the next days I will tray (if I had enough time) to make something...




> 
> Ok, I'll take the bad-hack.  How would I actually go about using said
> bad-hack?
> 
> -BJ
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it>
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-12 21:41                     ` BJ Quinn
  2011-12-13 22:06                       ` Goffredo Baroncelli
@ 2011-12-30  0:25                       ` BJ Quinn
  2012-01-12  0:52                         ` BJ Quinn
  2012-01-12  6:41                         ` Chris Samuel
  1 sibling, 2 replies; 24+ messages in thread
From: BJ Quinn @ 2011-12-30  0:25 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jan Schmidt, Phillip Susi, Freddie Cash, linux-btrfs

Actually, I seem to be having problems where my rsync script ends up hanging the system again.  It's pretty repeatable, and the system is completely frozen and I have to do a hard reboot.  Runs for a couple of hours and hangs the system every time.  Of course, I'm not doing anything special other than an rsync of compressed btrfs data and snapshots.  Well, that and my btrfs partitions are on external SATA port multipliers and btrfs is used to create a two drive RAID-0 for each partition (the source and the destination).  I tried the bwlimit switch on rsync, which seemed to allow it to go longer between crashes, but of course that just means I'm copying the data slower too....

I can't find anything in the usual logs.  Any suggestions?  I'm using CentOS 6.2 fully updated.

-BJ

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-30  0:25                       ` BJ Quinn
@ 2012-01-12  0:52                         ` BJ Quinn
  2012-01-12  6:41                         ` Chris Samuel
  1 sibling, 0 replies; 24+ messages in thread
From: BJ Quinn @ 2012-01-12  0:52 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jan Schmidt, Phillip Susi, Freddie Cash, linux-btrfs

Now I've managed to basically bring my system to its knees.  My rsync script that takes weeks ends up bringing the system to a crawl long before it can ever finish.  I end up with 100% of the CPU used up by the following as shown by top

btrfs-endio-wri
btrfs-delayed-m
btrfs-transacti
btrfs-delalloc-
btrfs-endio-met

Now, I've got a bunch of snapshots, and the server is a backup server that backs up all the machines on the network.  It's using -o compress.  I've got a 6TB array of 2 3TB drives, that is now about 85% full.  There's lots of small files.  I tried to add another drive, but it won't ever finish a rebalance.  Df shows all 9TB as part of the array, but only shows available space as if the array was 6TB.  An attempt at copying all the data to a second array effectively brings the computer to its knees running the threads explained above.  The server never really recovers until a hard reboot and can't ever finish running a backup.

Are there any mount options I should change?  I need the compression and snapshots to have enough space.

-BJ

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-30  0:25                       ` BJ Quinn
  2012-01-12  0:52                         ` BJ Quinn
@ 2012-01-12  6:41                         ` Chris Samuel
  1 sibling, 0 replies; 24+ messages in thread
From: Chris Samuel @ 2012-01-12  6:41 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: Text/Plain, Size: 424 bytes --]

On Fri, 30 Dec 2011 11:25:58 AM BJ Quinn wrote:

> Any suggestions?  I'm using CentOS 6.2 fully updated.

Are you using the 3.2 kernel as well ?

The RHEL kernel probably has an old version of btrfs in it.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 15:49       ` Phillip Susi
  2011-12-08 16:07         ` BJ Quinn
@ 2011-12-08 16:27         ` Stephane CHAZELAS
  1 sibling, 0 replies; 24+ messages in thread
From: Stephane CHAZELAS @ 2011-12-08 16:27 UTC (permalink / raw)
  To: linux-btrfs

2011-12-08, 10:49(-05), Phillip Susi:
> On 12/7/2011 1:49 PM, BJ Quinn wrote:
>> What I need isn't really an equivalent "zfs send" -- my script can do
>> that. As I remember, zfs send was pretty slow too in a scenario like
>> this. What I need is to be able to clone a btrfs array somehow -- dd
>> would be nice, but as I said I end up with the identical UUID
>> problem. Is there a way to change the UUID of an array?
>
> No, btrfs send is exactly what you need.  Using dd is slow because it 
> copies unused blocks, and requires the source fs be unmounted.
[...]

Not necessarily, you can snapshot them (as in the method I
suggested). If your FS is already on a device mapper device, you
can even get away with not unmounting it (freeze, reload the
device mapper table with a snapshot-origin one and thaw).

> and the destination be an empty partition.  rsync is slow
> because it can't take advantage of the btrfs tree to quickly
> locate the files (or parts of them) that have changed.  A
> btrfs send would solve all of these issues.
[...]

When you want to clone a FS using a similar device or set of
devices, a tool like clone2fs or ntfsclone that copies only the
used sectors across sequentially would probably be a lot more
efficient as it copies the data at the max speed of the drive,
seeking as little as possible.

-- 
Stephane

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-07 18:35 ` Cloning a Btrfs partition BJ Quinn
  2011-12-07 18:39   ` Freddie Cash
@ 2011-12-08 10:00   ` Stephane CHAZELAS
  2011-12-08 19:22     ` Goffredo Baroncelli
  1 sibling, 1 reply; 24+ messages in thread
From: Stephane CHAZELAS @ 2011-12-08 10:00 UTC (permalink / raw)
  To: linux-btrfs

2011-12-07, 12:35(-06), BJ Quinn:
> I've got a 6TB btrfs array (two 3TB drives in a RAID 0). It's
> about 2/3 full and has lots of snapshots. I've written a
> script that runs through the snapshots and copies the data
> efficiently (rsync --inplace --no-whole-file) from the main
> 6TB array to a backup array, creating snapshots on the backup
> array and then continuing on copying the next snapshot.
> Problem is, it looks like it will take weeks to finish. 
>
> I've tried simply using dd to clone the btrfs partition, which
> technically appears to work, but then it appears that the UUID
> between the arrays is identical, so I can only mount one or
> the other. This means I can't continue to simply update the
> backup array with the new snapshots created on the main array
> (my script is capable of "catching up" the backup array with
> the new snapshots, but if I can't mount both arrays...). 
[...]

You can mount them if you specify the devices upon mount.

Here's a method to transfer a full FS to some other with
different layout.

In this example, we're transfering from a FS on a 3GB device
(/dev/loop1) to a new FS on 2 2GB devices (/dev/loop2,
/dev/loop3)

truncate -s 3G a1
truncate -s 2G b1 b2
losetup /dev/loop1 a1
losetup /dev/loop2 b1
losetup /dev/loop2 b2

# our src FS on 1 disk:
mkfs.btrfs /dev/loop1
mkdir A B
mount /dev/loop1 A
# now we can fill it up, create subvolumes and snapshots...

# at this point, we decide to make a clone of it. To do that, we
# will make a snapshot of the device. For that, we need
# temporary storage as a block device. That could be a disk
# (like a USB key) or a nbd to another host, or anything. Here,
# I'm going to use a loop device to a file. You need enough
# space to store any modification done on the src FS while
# you're the transfer and what is needed to do the transfer
# (I can't tell you much about that).

truncate -s 100M sa
losetup /dev/loop4 sa

umount A
size=$(blockdev --getsize /dev/loop1)
echo 0 "$size" /dev/loop1) snapshot-origin /dev/loop1 | dmsetup create a
echo 0 "$size" snapshot /dev/loop1 /dev/loop4 N 8 | dmsetup create aSnap

# now we have /dev/mapper/a as the src device which we can
# remount as such and use:
mount /dev/mapper/a A

# and aSnap as a writable snapshot of the src device, which we
# mount separately:
mount /dev/mapper/aSnap B

# The trick here is that we're going to add the two new devices
# to "B" and remove the snapshot one. btrfs will automatically
# migrate the data to the new device:
btrfs device add /dev/loop2 /dev/loop3 B
btrfs device delete /dev/mapper/aSnap B
# END
Once that's completed, you should have a copy of A in B.

You may want to watch the status of the snapshot while you're
transfering to check that it doesn't get full 

That method can't be used to do some incremental "syncing"
between two FS for which you'd still need something similar to
"zfs send" (speaking of which, you may want to consider
zfsonlinux which is now reaching a point where it's about as
stable as btrfs, same performance level if not better and has a
lot more features. I'm doing the switch myself while waiting for
btrfs to be a bit more mature)

Because of the same uuid, the btrfs commands like filesystem
show will not always give sensible outputs. I tried to rename
the fsid by changing it in the superblocks, but it looks like it
is alsa included in a few other places where changing it
manually breaks some checksums, so I guess someone would have to
write a tool to do that job. I'm surprised it doesn't exist
already (or maybe it does and I'm not aware of it?).

-- 
Stephane

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2011-12-08 10:00   ` Stephane CHAZELAS
@ 2011-12-08 19:22     ` Goffredo Baroncelli
  0 siblings, 0 replies; 24+ messages in thread
From: Goffredo Baroncelli @ 2011-12-08 19:22 UTC (permalink / raw)
  To: Stephane CHAZELAS; +Cc: linux-btrfs

On Thursday, 08 December, 2011 10:00:54 Stephane CHAZELAS wrote:
> Because of the same uuid, the btrfs commands like filesystem
> show will not always give sensible outputs. I tried to rename
> the fsid by changing it in the superblocks, but it looks like it
> is alsa included in a few other places where changing it
> manually breaks some checksums, so I guess someone would have to
> write a tool to do that job. I'm surprised it doesn't exist
> already (or maybe it does and I'm not aware of it?).

The fs-uuid is recorded in the header of every tree block.

>From fs/btrfs/ctree.h

[...]
/*
 * every tree block (leaf or node) starts with this header.
 */
struct btrfs_header {
        /* these first four must match the super block */
        u8 csum[BTRFS_CSUM_SIZE];
        u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
[...]

Moreover I would be worried more about the uuid of the device than the 
filesystem one...




-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it>
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fwd: Cloning a Btrfs partition
@ 2013-07-29  8:21 Jan Schmidt
  2013-07-29 15:32 ` BJ Quinn
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schmidt @ 2013-07-29  8:21 UTC (permalink / raw)
  To: BJ Quinn; +Cc: linux-btrfs, psusi, Freddie Cash

Hi BJ,

[original message rewrapped]

On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote:
> (Apologies for the double post -- forgot to send as plain text the first time
> around, so the list rejected it.)
> 
> I see that there's now a btrfs send / receive and I've tried using it, but
> I'm getting the oops I've pasted below, after which the FS becomes
> unresponsive (no I/O to the drive, no CPU usage, but all attempts to access
> the FS results in a hang). I have an internal drive (single drive) that
> contains 82GB of compressed data with a couple hundred snapshots. I tried
> taking the first snapshot and making a read only copy (btrfs subvolume
> snapshot -r) and then I connected an external USB drive and ran btrfs send /
> receive to that external drive. It starts working and gets a couple of GB in
> (I'd expect the first snapshot to be about 20GB) and then gets the following
> error. I had to use the latest copy of btrfs-progs from git, because the
> package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply
> returned "invalid argument" when trying to run btrfs send / receive. Thanks
> in advance for any info you may have.

The problem has been introduced with rbtree ulists in 3.10, commit

    Btrfs: add a rb_tree to improve performance of ulist search

You should be safe to revert that commit, it's a performance optimization
attempt. Alternatively, you can apply the published fix

    Btrfs: fix crash regarding to ulist_add_merge

It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's
btrfs-next

    git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

Thanks,
-Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2013-07-29  8:21 Fwd: " Jan Schmidt
@ 2013-07-29 15:32 ` BJ Quinn
  2013-07-30 10:28   ` Jan Schmidt
  0 siblings, 1 reply; 24+ messages in thread
From: BJ Quinn @ 2013-07-29 15:32 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, psusi, Freddie Cash

Thanks for the response!  Not sure I want to roll a custom kernel on this
particular system.  Any idea on when it might make it to 3.10 stable or 
3.11?  Or should I just revert back to 3.9?

Thanks!

-BJ

----- Original Message ----- 

From: "Jan Schmidt" <list.btrfs@jan-o-sch.net> 
Sent: Monday, July 29, 2013 3:21:51 AM 

Hi BJ, 

[original message rewrapped] 

On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote: 
> (Apologies for the double post -- forgot to send as plain text the first time 
> around, so the list rejected it.) 
> 
> I see that there's now a btrfs send / receive and I've tried using it, but 
> I'm getting the oops I've pasted below, after which the FS becomes 
> unresponsive (no I/O to the drive, no CPU usage, but all attempts to access 
> the FS results in a hang). I have an internal drive (single drive) that 
> contains 82GB of compressed data with a couple hundred snapshots. I tried 
> taking the first snapshot and making a read only copy (btrfs subvolume 
> snapshot -r) and then I connected an external USB drive and ran btrfs send / 
> receive to that external drive. It starts working and gets a couple of GB in 
> (I'd expect the first snapshot to be about 20GB) and then gets the following 
> error. I had to use the latest copy of btrfs-progs from git, because the 
> package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply 
> returned "invalid argument" when trying to run btrfs send / receive. Thanks 
> in advance for any info you may have. 

The problem has been introduced with rbtree ulists in 3.10, commit 

Btrfs: add a rb_tree to improve performance of ulist search 

You should be safe to revert that commit, it's a performance optimization 
attempt. Alternatively, you can apply the published fix 

Btrfs: fix crash regarding to ulist_add_merge 

It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's 
btrfs-next 

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 

Thanks, 
-Jan 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2013-07-29 15:32 ` BJ Quinn
@ 2013-07-30 10:28   ` Jan Schmidt
  2013-08-19 20:45     ` BJ Quinn
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schmidt @ 2013-07-30 10:28 UTC (permalink / raw)
  To: BJ Quinn; +Cc: Jan Schmidt, linux-btrfs, psusi, Freddie Cash

On Mon, July 29, 2013 at 17:32 (+0200), BJ Quinn wrote:
> Thanks for the response!  Not sure I want to roll a custom kernel on this
> particular system.  Any idea on when it might make it to 3.10 stable or 
> 3.11?  Or should I just revert back to 3.9?

I missed that it's in fact in 3.11 and if I got Liu Bo right he's going to send
it to 3.10 stable soon.

Thanks,
-Jan

> Thanks!
> 
> -BJ
> 
> ----- Original Message ----- 
> 
> From: "Jan Schmidt" <list.btrfs@jan-o-sch.net> 
> Sent: Monday, July 29, 2013 3:21:51 AM 
> 
> Hi BJ, 
> 
> [original message rewrapped] 
> 
> On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote: 
>> (Apologies for the double post -- forgot to send as plain text the first time 
>> around, so the list rejected it.) 
>>
>> I see that there's now a btrfs send / receive and I've tried using it, but 
>> I'm getting the oops I've pasted below, after which the FS becomes 
>> unresponsive (no I/O to the drive, no CPU usage, but all attempts to access 
>> the FS results in a hang). I have an internal drive (single drive) that 
>> contains 82GB of compressed data with a couple hundred snapshots. I tried 
>> taking the first snapshot and making a read only copy (btrfs subvolume 
>> snapshot -r) and then I connected an external USB drive and ran btrfs send / 
>> receive to that external drive. It starts working and gets a couple of GB in 
>> (I'd expect the first snapshot to be about 20GB) and then gets the following 
>> error. I had to use the latest copy of btrfs-progs from git, because the 
>> package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply 
>> returned "invalid argument" when trying to run btrfs send / receive. Thanks 
>> in advance for any info you may have. 
> 
> The problem has been introduced with rbtree ulists in 3.10, commit 
> 
> Btrfs: add a rb_tree to improve performance of ulist search 
> 
> You should be safe to revert that commit, it's a performance optimization 
> attempt. Alternatively, you can apply the published fix 
> 
> Btrfs: fix crash regarding to ulist_add_merge 
> 
> It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's 
> btrfs-next 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
> 
> Thanks, 
> -Jan 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2013-07-30 10:28   ` Jan Schmidt
@ 2013-08-19 20:45     ` BJ Quinn
  2013-08-20  9:59       ` Xavier Bassery
  0 siblings, 1 reply; 24+ messages in thread
From: BJ Quinn @ 2013-08-19 20:45 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: Jan Schmidt, linux-btrfs, psusi, Freddie Cash, bo li liu

Ok, so the fix is now in 3.10.6 and I'm using that.  I don't get the hang
anymore, but now I'm having a new problem.

Mount options --

rw,noatime,nodiratime,compress-force=zlib,space_cache,inode_cache,ssd

I need compression because I get a very high compression ratio with my data
and I have lots of snapshots, so it's the only way it can all fit. I have an
ssd and 24 cores anyway, so it should be fast. I need compress-force because
I have lots of files in my data which compress typically by a 10:1 or 20:1
ratio, but btrfs likes to see them as incompressible, so I need the
compress-force flag. I've just heard good things about space_cache
and inode_cache, so I've enabled them. The ssd option is because I do have an
ssd, but I have DRBD on top of it, and it looked like btrfs could not
automatically detect that it was an ssd (rotation speed was showing as "1").

Using newest btrfs-progs from git, because newest shipping btrfs-progs on
CentOS 6 returns an error for invalid argument.

I have a filesystem with maybe 1000 snapshots. They're daily snapshots of
a filesystem that is about 24GB compressed. The total space usage is 323GB
out of 469GB on an Intel SSD.

All the snapshots are writable, so I know I have to create a readonly
snapshot to copy to a backup drive.

btrfs subvolume snapshot -r /home/data/snapshots/storage\@NIGHTLY20101201 /home/data/snapshots\storageROTEMP

Then I send the snapshot to the backup drive, mounted with the same mount options.

btrfs send /home/data/snapshots/storageROTEMP | btrfs receive /mnt/backup/snapshots/

This takes about 5 hours to transfer 24GB compressed. Uncompressed it is about
150GB.  There is a "btrfs" process that takes 100% of one core during this 5
hour period.  There are some btrfs-endio and other processes that are using
small amounts of more than one core, but the "btrfs" process always takes 100%
and always only takes one core. And iostat clearly shows no significant disk
activity, so we're completely waiting on the btrfs command. Keep in mind that
the source filesystem is on an SSD, so it should be super fast. The destination
filesystem is on a hard drive connected via USB 2.0, but again, there's no
significant disk activity.  Processor is a dual socket Xeon E5-2420.

Then I try to copy another snapshot to the backup drive, hoping that it will
keep the space efficiency of the snapshots.

mv /mnt/backup/snapshots/storageROTEMP /mnt/backup/snapshots/storage\@NIGHTLY20101201
btrfs subvolume delete /home/data/snapshots/storageROTEMP
btrfs subvolume snapshot -r /home/data/snapshots/storage\@NIGHTLY20101202 /home/data/snapshots/storageROTEMP
btrfs send /home/data/snapshots/storageROTEMP | btrfs receive /mnt/backup/snapshots/

This results in a couple of problems. First of all, it takes 5 hours just like
the first snapshot did. Secondly, it takes up another ~20GB of data, so it's not
space efficient (I expect each snapshot should add far less than 500MB on
average due to the math on how many snapshots I have and how much total space
usage I have on the main filesystem). Finally, it doesn't even complete without
error. I get the following error after about 5 hours --

At subvol /home/data/snapshots/storageROTEMP
At subvol storageROTEMP
ERROR: send ioctl failed with -12: Cannot allocate memory
ERROR: unexpected EOF in stream.

So in the end, unless I'm doing something wrong, btrfs send is much slower
than just doing a full rsync of the first snapshot, and then incremental
rsyncs with the subsequent ones.  That and btrfs send doesn't seem to be
space efficient here (again, unless I'm using it incorrectly).

Thanks in advance for your help!

-BJ

----- Original Message ----- 

From: "Jan Schmidt" <mail@jan-o-sch.net> 
To: "BJ Quinn" <bj@placs.net> 
Cc: "Jan Schmidt" <list.btrfs@jan-o-sch.net>, linux-btrfs@vger.kernel.org, psusi@cfl.rr.com, "Freddie Cash" <fjwcash@gmail.com> 
Sent: Tuesday, July 30, 2013 5:28:00 AM 
Subject: Re: Cloning a Btrfs partition 

On Mon, July 29, 2013 at 17:32 (+0200), BJ Quinn wrote: 
> Thanks for the response! Not sure I want to roll a custom kernel on this 
> particular system. Any idea on when it might make it to 3.10 stable or 
> 3.11? Or should I just revert back to 3.9? 

I missed that it's in fact in 3.11 and if I got Liu Bo right he's going to send 
it to 3.10 stable soon. 

Thanks, 
-Jan 

> Thanks! 
> 
> -BJ 
> 
> ----- Original Message ----- 
> 
> From: "Jan Schmidt" <list.btrfs@jan-o-sch.net> 
> Sent: Monday, July 29, 2013 3:21:51 AM 
> 
> Hi BJ, 
> 
> [original message rewrapped] 
> 
> On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote: 
>> (Apologies for the double post -- forgot to send as plain text the first time 
>> around, so the list rejected it.) 
>> 
>> I see that there's now a btrfs send / receive and I've tried using it, but 
>> I'm getting the oops I've pasted below, after which the FS becomes 
>> unresponsive (no I/O to the drive, no CPU usage, but all attempts to access 
>> the FS results in a hang). I have an internal drive (single drive) that 
>> contains 82GB of compressed data with a couple hundred snapshots. I tried 
>> taking the first snapshot and making a read only copy (btrfs subvolume 
>> snapshot -r) and then I connected an external USB drive and ran btrfs send / 
>> receive to that external drive. It starts working and gets a couple of GB in 
>> (I'd expect the first snapshot to be about 20GB) and then gets the following 
>> error. I had to use the latest copy of btrfs-progs from git, because the 
>> package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply 
>> returned "invalid argument" when trying to run btrfs send / receive. Thanks 
>> in advance for any info you may have. 
> 
> The problem has been introduced with rbtree ulists in 3.10, commit 
> 
> Btrfs: add a rb_tree to improve performance of ulist search 
> 
> You should be safe to revert that commit, it's a performance optimization 
> attempt. Alternatively, you can apply the published fix 
> 
> Btrfs: fix crash regarding to ulist_add_merge 
> 
> It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's 
> btrfs-next 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
> 
> Thanks, 
> -Jan 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2013-08-19 20:45     ` BJ Quinn
@ 2013-08-20  9:59       ` Xavier Bassery
  2013-08-20 15:43         ` BJ Quinn
  0 siblings, 1 reply; 24+ messages in thread
From: Xavier Bassery @ 2013-08-20  9:59 UTC (permalink / raw)
  To: BJ Quinn; +Cc: psusi, Jan Schmidt, linux-btrfs, Freddie Cash, bo li liu

On Mon, 19 Aug 2013 15:45:32 -0500 (CDT)
BJ Quinn <bj@placs.net> wrote:

> Ok, so the fix is now in 3.10.6 and I'm using that.  I don't get the
> hang anymore, but now I'm having a new problem.
> 
> Mount options --
> 
> rw,noatime,nodiratime,compress-force=zlib,space_cache,inode_cache,ssd
> 
> I need compression because I get a very high compression ratio with
> my data and I have lots of snapshots, so it's the only way it can all
> fit. I have an ssd and 24 cores anyway, so it should be fast. I need
> compress-force because I have lots of files in my data which compress
> typically by a 10:1 or 20:1 ratio, but btrfs likes to see them as
> incompressible, so I need the compress-force flag. I've just heard
> good things about space_cache and inode_cache, so I've enabled them.
> The ssd option is because I do have an ssd, but I have DRBD on top of
> it, and it looked like btrfs could not automatically detect that it
> was an ssd (rotation speed was showing as "1").
> 
> Using newest btrfs-progs from git, because newest shipping
> btrfs-progs on CentOS 6 returns an error for invalid argument.
> 
> I have a filesystem with maybe 1000 snapshots. They're daily
> snapshots of a filesystem that is about 24GB compressed. The total
> space usage is 323GB out of 469GB on an Intel SSD.
> 
> All the snapshots are writable, so I know I have to create a readonly
> snapshot to copy to a backup drive.

Hi BJ,

I am curious to know why you use writable snapshots instead of
read-only?
When I use snapshots as a base for backups, I create them read-only, so
that I don't need to worry something might have accidentally changed in
any of those.
I only use writable ones in cases when I actually need to write to them
(e.g. doing an experimental upgrade on a system root subvolume).
As a bonus, this would save you the need to:
1. create a ro snapshot of your rw one
2. rename the sent snapshot on the destination fs to a meaningful name.

> 
> btrfs subvolume snapshot
> -r /home/data/snapshots/storage\@NIGHTLY20101201 /home/data/snapshots\storageROTEMP
> 
> Then I send the snapshot to the backup drive, mounted with the same
> mount options.
> 
> btrfs send /home/data/snapshots/storageROTEMP | btrfs
> receive /mnt/backup/snapshots/
> 
> This takes about 5 hours to transfer 24GB compressed. Uncompressed it
> is about 150GB.  There is a "btrfs" process that takes 100% of one
> core during this 5 hour period.  There are some btrfs-endio and other
> processes that are using small amounts of more than one core, but the
> "btrfs" process always takes 100% and always only takes one core. And
> iostat clearly shows no significant disk activity, so we're
> completely waiting on the btrfs command. Keep in mind that the source
> filesystem is on an SSD, so it should be super fast. The destination
> filesystem is on a hard drive connected via USB 2.0, but again,
> there's no significant disk activity.  Processor is a dual socket
> Xeon E5-2420.

5 hours for 150GB, meaning you only get ~8MB/s to your USB2 external
HD (instead of the ~25MB/s you could expect from USB2) is indeed rather
slow. 
But as you have noticed, your bottleneck here is cpu-bound, which I
guess you find frustrating given how powerful your system is (2 x 6
cores cpu + hyperthreading = 24 threads).
Your case may illustrate the need for more parallelism...

My guess is that the poor performance stems from your choice of
'compress-force=zlib' mount option.
First, zlib compression is known to be slower than lzo while able to
give higher compression ratios. 
Secondly, 'compress-force' while giving you even better compression
means that your system will also compress already highly compressed
files (and potentially big and/or numerous).
To sum up, you have chosen space efficiency at the cost of performance
because of the lack of parallelism in this particular use case (so
your multi-core system cannot help).

> 
> Then I try to copy another snapshot to the backup drive, hoping that
> it will keep the space efficiency of the snapshots.
> 
> mv /mnt/backup/snapshots/storageROTEMP /mnt/backup/snapshots/storage\@NIGHTLY20101201
> btrfs subvolume delete /home/data/snapshots/storageROTEMP
> btrfs subvolume snapshot
> -r /home/data/snapshots/storage\@NIGHTLY20101202 /home/data/snapshots/storageROTEMP
> btrfs send /home/data/snapshots/storageROTEMP | btrfs
> receive /mnt/backup/snapshots/
> 
> This results in a couple of problems. First of all, it takes 5 hours
> just like the first snapshot did. Secondly, it takes up another ~20GB
> of data, so it's not space efficient (I expect each snapshot should
> add far less than 500MB on average due to the math on how many
> snapshots I have and how much total space usage I have on the main
> filesystem).

It is not surprising that it takes another 5 hours, because you've sent
a full copy of your new snapshot made at day+1! What you should have
done instead is :

btrfs send -p <path_of_parent_snapshot> <path_of_next_snapshot>, so in
your case that would be:

btrfs send -p [...]20101201 [...]20101202 | btrfs receive
<path_to_backup_volume>

(I have omitted your paths in the above for clarity).
For this to work, you need to use read-only dated snapshots.

> Finally, it doesn't even complete without error. I get
> the following error after about 5 hours --
> 
> At subvol /home/data/snapshots/storageROTEMP
> At subvol storageROTEMP
> ERROR: send ioctl failed with -12: Cannot allocate memory
> ERROR: unexpected EOF in stream.

I am not competent enough to explain this error.

> 
> So in the end, unless I'm doing something wrong, btrfs send is much
> slower than just doing a full rsync of the first snapshot, and then
> incremental rsyncs with the subsequent ones.  That and btrfs send
> doesn't seem to be space efficient here (again, unless I'm using it
> incorrectly).

At least you were right supposing you were not using it correctly :p

Best regards,
Xavier


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Cloning a Btrfs partition
  2013-08-20  9:59       ` Xavier Bassery
@ 2013-08-20 15:43         ` BJ Quinn
  0 siblings, 0 replies; 24+ messages in thread
From: BJ Quinn @ 2013-08-20 15:43 UTC (permalink / raw)
  To: Xavier Bassery; +Cc: psusi, Jan Schmidt, linux-btrfs, Freddie Cash, bo li liu

The use of writable snapshots isn't necessary.  It's just what I had to
start with.  I'm sure I could switch to using read only snapshots
exclusively to skip the additional steps.

As for the throughput, the disparity between actual speed and a speed
I might expect to achieve is that much greater over USB 3.0 or
SATA.  I have sort of a strange data set.  It's primarily legacy
FoxPro DBF files, which are mostly empty space.  For some reason, btrfs
thinks they're incompressible when not using compress-force.  Nearly
everything on my filesystem is compressible.  There are a few directories
with lots of small files, but it's primarily large (100MB+) compressible
file-based databases.  Anyway, due to choices made by btrfs with
respect to detecting compressibility, I'm required to use the
compress-force option.

As for the compression method, I will go ahead and try lzo and see how
much space I lose.  It may be worth the tradeoff if I end up with
better performance.  Perhaps lzo in conjunction with readonly snaps
AND the proper syntax for sending incremental snaps will make btrfs
send work for my situation.  Thanks for the suggestions!!!

At any rate, it seems that btrfs send would benefit from parallelism
if it were at all reasonably possible to do so.  I'm surprised ANY
compression method could really tax modern hardware to that extent.

-BJ

----- Original Message ----- 

From: "Xavier Bassery" <xavier@bartica.org> 
To: "BJ Quinn" <bj@placs.net> 
Cc: psusi@cfl.rr.com, "Jan Schmidt" <list.btrfs@jan-o-sch.net>, linux-btrfs@vger.kernel.org, "Freddie Cash" <fjwcash@gmail.com>, "bo li liu" <bo.li.liu@oracle.com> 
Sent: Tuesday, August 20, 2013 4:59:23 AM 
Subject: Re: Cloning a Btrfs partition 

On Mon, 19 Aug 2013 15:45:32 -0500 (CDT) 
BJ Quinn <bj@placs.net> wrote: 

> Ok, so the fix is now in 3.10.6 and I'm using that. I don't get the 
> hang anymore, but now I'm having a new problem. 
> 
> Mount options -- 
> 
> rw,noatime,nodiratime,compress-force=zlib,space_cache,inode_cache,ssd 
> 
> I need compression because I get a very high compression ratio with 
> my data and I have lots of snapshots, so it's the only way it can all 
> fit. I have an ssd and 24 cores anyway, so it should be fast. I need 
> compress-force because I have lots of files in my data which compress 
> typically by a 10:1 or 20:1 ratio, but btrfs likes to see them as 
> incompressible, so I need the compress-force flag. I've just heard 
> good things about space_cache and inode_cache, so I've enabled them. 
> The ssd option is because I do have an ssd, but I have DRBD on top of 
> it, and it looked like btrfs could not automatically detect that it 
> was an ssd (rotation speed was showing as "1"). 
> 
> Using newest btrfs-progs from git, because newest shipping 
> btrfs-progs on CentOS 6 returns an error for invalid argument. 
> 
> I have a filesystem with maybe 1000 snapshots. They're daily 
> snapshots of a filesystem that is about 24GB compressed. The total 
> space usage is 323GB out of 469GB on an Intel SSD. 
> 
> All the snapshots are writable, so I know I have to create a readonly 
> snapshot to copy to a backup drive. 

Hi BJ, 

I am curious to know why you use writable snapshots instead of 
read-only? 
When I use snapshots as a base for backups, I create them read-only, so 
that I don't need to worry something might have accidentally changed in 
any of those. 
I only use writable ones in cases when I actually need to write to them 
(e.g. doing an experimental upgrade on a system root subvolume). 
As a bonus, this would save you the need to: 
1. create a ro snapshot of your rw one 
2. rename the sent snapshot on the destination fs to a meaningful name. 

> 
> btrfs subvolume snapshot 
> -r /home/data/snapshots/storage\@NIGHTLY20101201 /home/data/snapshots\storageROTEMP 
> 
> Then I send the snapshot to the backup drive, mounted with the same 
> mount options. 
> 
> btrfs send /home/data/snapshots/storageROTEMP | btrfs 
> receive /mnt/backup/snapshots/ 
> 
> This takes about 5 hours to transfer 24GB compressed. Uncompressed it 
> is about 150GB. There is a "btrfs" process that takes 100% of one 
> core during this 5 hour period. There are some btrfs-endio and other 
> processes that are using small amounts of more than one core, but the 
> "btrfs" process always takes 100% and always only takes one core. And 
> iostat clearly shows no significant disk activity, so we're 
> completely waiting on the btrfs command. Keep in mind that the source 
> filesystem is on an SSD, so it should be super fast. The destination 
> filesystem is on a hard drive connected via USB 2.0, but again, 
> there's no significant disk activity. Processor is a dual socket 
> Xeon E5-2420. 

5 hours for 150GB, meaning you only get ~8MB/s to your USB2 external 
HD (instead of the ~25MB/s you could expect from USB2) is indeed rather 
slow. 
But as you have noticed, your bottleneck here is cpu-bound, which I 
guess you find frustrating given how powerful your system is (2 x 6 
cores cpu + hyperthreading = 24 threads). 
Your case may illustrate the need for more parallelism... 

My guess is that the poor performance stems from your choice of 
'compress-force=zlib' mount option. 
First, zlib compression is known to be slower than lzo while able to 
give higher compression ratios. 
Secondly, 'compress-force' while giving you even better compression 
means that your system will also compress already highly compressed 
files (and potentially big and/or numerous). 
To sum up, you have chosen space efficiency at the cost of performance 
because of the lack of parallelism in this particular use case (so 
your multi-core system cannot help). 

> 
> Then I try to copy another snapshot to the backup drive, hoping that 
> it will keep the space efficiency of the snapshots. 
> 
> mv /mnt/backup/snapshots/storageROTEMP /mnt/backup/snapshots/storage\@NIGHTLY20101201 
> btrfs subvolume delete /home/data/snapshots/storageROTEMP 
> btrfs subvolume snapshot 
> -r /home/data/snapshots/storage\@NIGHTLY20101202 /home/data/snapshots/storageROTEMP 
> btrfs send /home/data/snapshots/storageROTEMP | btrfs 
> receive /mnt/backup/snapshots/ 
> 
> This results in a couple of problems. First of all, it takes 5 hours 
> just like the first snapshot did. Secondly, it takes up another ~20GB 
> of data, so it's not space efficient (I expect each snapshot should 
> add far less than 500MB on average due to the math on how many 
> snapshots I have and how much total space usage I have on the main 
> filesystem). 

It is not surprising that it takes another 5 hours, because you've sent 
a full copy of your new snapshot made at day+1! What you should have 
done instead is : 

btrfs send -p <path_of_parent_snapshot> <path_of_next_snapshot>, so in 
your case that would be: 

btrfs send -p [...]20101201 [...]20101202 | btrfs receive 
<path_to_backup_volume> 

(I have omitted your paths in the above for clarity). 
For this to work, you need to use read-only dated snapshots. 

> Finally, it doesn't even complete without error. I get 
> the following error after about 5 hours -- 
> 
> At subvol /home/data/snapshots/storageROTEMP 
> At subvol storageROTEMP 
> ERROR: send ioctl failed with -12: Cannot allocate memory 
> ERROR: unexpected EOF in stream. 

I am not competent enough to explain this error. 

> 
> So in the end, unless I'm doing something wrong, btrfs send is much 
> slower than just doing a full rsync of the first snapshot, and then 
> incremental rsyncs with the subsequent ones. That and btrfs send 
> doesn't seem to be space efficient here (again, unless I'm using it 
> incorrectly). 

At least you were right supposing you were not using it correctly :p 

Best regards, 
Xavier 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-08-20 15:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bb747e0c-d6d8-4f60-a3f6-cf64c856515e@mail.placs.net>
2011-12-07 18:35 ` Cloning a Btrfs partition BJ Quinn
2011-12-07 18:39   ` Freddie Cash
2011-12-07 18:49     ` BJ Quinn
2011-12-08 15:49       ` Phillip Susi
2011-12-08 16:07         ` BJ Quinn
2011-12-08 16:09           ` Jan Schmidt
2011-12-08 16:28             ` BJ Quinn
2011-12-08 16:41               ` Jan Schmidt
2011-12-08 19:56                 ` BJ Quinn
2011-12-08 20:05                   ` Chris Mason
2011-12-08 20:38                     ` BJ Quinn
2011-12-12 21:41                     ` BJ Quinn
2011-12-13 22:06                       ` Goffredo Baroncelli
2011-12-30  0:25                       ` BJ Quinn
2012-01-12  0:52                         ` BJ Quinn
2012-01-12  6:41                         ` Chris Samuel
2011-12-08 16:27         ` Stephane CHAZELAS
2011-12-08 10:00   ` Stephane CHAZELAS
2011-12-08 19:22     ` Goffredo Baroncelli
2013-07-29  8:21 Fwd: " Jan Schmidt
2013-07-29 15:32 ` BJ Quinn
2013-07-30 10:28   ` Jan Schmidt
2013-08-19 20:45     ` BJ Quinn
2013-08-20  9:59       ` Xavier Bassery
2013-08-20 15:43         ` BJ Quinn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).