linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Pottage <david@chrestomanci.org>
To: Marc MERLIN <marc@merlins.org>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs snapshot sizes
Date: Fri, 09 May 2014 08:42:22 +0100	[thread overview]
Message-ID: <536C86DE.9090003@chrestomanci.org> (raw)
In-Reply-To: <20140507111949.GT10159@merlins.org>

On 07/05/14 12:19, Marc MERLIN wrote:
> So have others found a good way to have an idea about how much space is
> taken by each snapshot?
>
> I've tried quota trees, but I'm not sure how to read the output, or if it's
> correct (including the negative numbers some have mentioned). Are there
> other options?
>
> I think the main problem is that the shared data field is not working,
> making it harder to know which blocks are only used in a given snapshot.

In my understanding (devs please correct me if I am wrong), a snapshot 
is just a subvolume that happens to share a lot of data with another 
subvolume. The idea of taking regular snapshots to preserve the state of 
the filing system at a point in time is a userland concept. From the 
kernel's point of view the user has asked for a clone of a subvolume, 
and both copies are equal. What the user does with one or other clone 
after that is their affair.

This means that suppose you have a subvolume representing your home 
directory that contains around 1Gb of data, and then take daily 
snapshots, asking the kernel how big each snapshot is will not give the 
answer you expect. They all contain roughly 1Gb.

The question you should be asking, is to compare two subvolumes. (eg the 
current /home and a snapshot taken of it last week), and ask how much 
data is different between the two. Depending on how you count the "size" 
of the snapshot will be the total amount of data that is not shared, or 
just the data that is in the snapshot but not the base.

The thing is, I don't think there is an easy way to get a report of the 
amount of non shared data without walking the file-systems in both 
subvolumes and building a large data structure of inodes or suchlike.

Measuring the size of snapshots will get even more thorny when you take 
many snapshots. For example suppose you take one every hour, and you 
have just deleted a large file. All your old hourly snapshots will 
contain a reference to that large file, but the data will only be on 
disc once, so you don't want to count it's size more than once when 
considering how much of you disc is taken up by snapshots.

NB: I am not a btrfs developer, just an interested user, and lurker on 
this list.

-- 
David Pottage







  reply	other threads:[~2014-05-09  8:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-07 11:19 btrfs snapshot sizes Marc MERLIN
2014-05-09  7:42 ` David Pottage [this message]
2014-05-09 14:06   ` Marc MERLIN
2014-05-09 17:23 ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=536C86DE.9090003@chrestomanci.org \
    --to=david@chrestomanci.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=marc@merlins.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).