linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Pharaoh <james@pharaoh.uk>
To: Hugo Mills <hugo@carfax.org.uk>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS backup questions
Date: Mon, 29 Sep 2014 13:02:35 +0200	[thread overview]
Message-ID: <54293C4B.5080201@pharaoh.uk> (raw)
In-Reply-To: <20140927165929.GC7191@carfax.org.uk>

On 27/09/14 18:59, Hugo Mills wrote:
>>>> 2. Duplicating NOCOW files
>     Are you trying to cross a mount-point with that? It works for me:

Here's a script which replicates what I'm doing:

https://gist.github.com/jamespharaoh/d693067ffd203689ebea

And here's the output when I run it:

https://gist.github.com/jamespharaoh/75cb937fd73b05c9128d

>     Be aware that the current implementation of (manual) defrag will
> separate the shared extents, so you no longer get the deduplication
> effect. There was a snapshot-aware defrag implementation, but it
> caused filesystem corruption, and has been removed for now until a
> working version can be written. I think Josef was working on this.

Yeah, good to know but won't be a major problem. So I'll probably leave 
cow on in almost all cases even for database files. I'll defragment 
those files and deduplicate all the rest. In the case of very large 
sites, which will be rare, I'll use nocow for those files and provision 
replication or whatever.

I'll do some performance testing at some point and post some code and 
the results here ;-)

>> Yes, this is one of my main inspirations. The problem is that I am pretty
>> sure it won't handle deduplication of the data.
>     It does. That's one of the things it's explicitly designed to do.

Ok, so I think I understand this now. I believe that the only type of 
object with a universal id is a subvolume, so the receive function can't 
identify items which already exist by themselves, or that it would be 
expensive to do so.

Providing a "parent" subvolume allows it to do that. So as long as the 
parent subvolume shares the reference with the filesystem being sent it 
will do so after the receive takes place on the target.

I think the issue for me is the word "parent". These are really 
"reference" filesystems.

The subvolumes you've told me to list as the parents are not parent 
filesystems at all, compared to the one I'm sending, except for the 
previous version of the same subvolume of course.

Is that all correct?

James

      reply	other threads:[~2014-09-29 11:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-27 15:39 BTRFS backup questions James Pharaoh
2014-09-27 16:17 ` Hugo Mills
2014-09-27 16:33   ` James Pharaoh
2014-09-27 16:59     ` Hugo Mills
2014-09-29 11:02       ` James Pharaoh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54293C4B.5080201@pharaoh.uk \
    --to=james@pharaoh.uk \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).