Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Where is the disk space?
Date: Fri, 13 Nov 2015 21:00:09 +0000 (UTC)	[thread overview]
Message-ID: <pan$18dab$9511f910$5c72bde7$8748826@cox.net> (raw)
In-Reply-To: 20151113200158.GA18943@merlins.org

Marc MERLIN posted on Fri, 13 Nov 2015 12:01:58 -0800 as excerpted:

> I'm still seeing 39GB used for 28GB of actual data, but I definitely
> fixed one bit already thanks to you.

For the data side, I think I understand what's going on with the space, 
but am not in sufficient mastery of the concept to feel confident that I 
can explain it well.  Never-the-less, here's a go at it.  One of the devs 
did a post complete with nice ascii diagrams if you're interested in 
trying to look it up.

What happens is that with larger files (particularly VM images and the 
like if they're not set nocow or if they're nocow but cow1-ed due to 
snapshotting, which AFAIK fits your use-case directly) originally setup 
in a few reasonably large extents, as rewrites occur, they cow and thus 
unmap random smaller extents from within the larger extents.  But, btrfs 
doesn't do extent splitting, so those larger extents remain pinned as 
long as at least some of the data within them remains referenced.  The 
result can eventually be rather cavernous mostly empty original extents 
still pinned in place by the few 4K blocks that haven't ever been 
rewritten.

If you believe you know what files are likely to be the culprit, and if 
you're doing VMs that's exactly where I'd look first, try temporarily 
moving them (and any old snapshots of them) out of the filesystem, doing 
a quick fi df to see if it has indeed emptied out some data chunks, and 
if so, a balance -dusage=80 or whatever to try to reclaim them, then move 
the file(s) back.

Of course if you still have other snapshots of the file pinning down 
those extents it won't free them, so you have to either delete the 
snapshots or at least delete the culprit file(s) from within them 
(obviously only with writable snapshots) in ordered for this to work.

In theory, defrag can do much the same thing, assuming of course there 
aren't snapshots containing the same file and still locking down its old 
extents, but that doesn't get the file off the filesystem for the 
cleanup, so it's likely to be somewhat less effective, perhaps 
considerably less effective if the remaining free space inside existing 
data chunks is itself highly fragmented.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  reply	other threads:[~2015-11-13 21:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-13 17:41 Where is the disk space? Marc MERLIN
2015-11-13 19:45 ` Duncan
2015-11-13 20:01   ` Marc MERLIN
2015-11-13 21:00     ` Duncan [this message]
2015-11-15  6:35 ` Liu Bo
2015-11-15 21:38   ` Marc MERLIN
2015-11-16  1:47   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$18dab$9511f910$5c72bde7$8748826@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox