Re: free space inode generation (0) did not match free space cache generation

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: free space inode generation (0) did not match free space cache generation
Date: Sat, 22 Mar 2014 23:32:16 +0000 (UTC)	[thread overview]
Message-ID: <pan$b7ff$285d59f6$2419371$b3946ebe@cox.net> (raw)
In-Reply-To: 532DFDAB.7000600@friedels.name

Hendrik Friedel posted on Sat, 22 Mar 2014 22:16:27 +0100 as excerpted:

> I read through the FAQ you mentioned, but I must admit, that I do not
> fully understand.

My experience is that it takes a bit of time to soak in.  Between time, 
previous Linux experience, and reading this list for awhile, things do 
make more sense now, but my understanding has definitely changed and 
deepened over time.

> What I am wondering about is, what caused this problem to arise. The
> filesystem was hardly a week old, never mistreated (powered down without
> unmounting or so) and not even half full. So what caused the data chunks
> all being allocated?

I can't really say, but it's worth noting that btrfs can normally 
allocate chunks, but doesn't (yet?) automatically deallocate them.  To 
deallocate, you balance.  Btrfs can reuse areas that have been deleted as 
the same thing, data or metadata, but it can't switch between them 
without a balance.

So the most obvious thing is that if you copy a bunch of stuff around so 
the filesystem is nearing full, then delete a bunch of it, consider 
checking your btrfs filesystem df/show stats and see whether you need a 
balance.  But like I said, that's obvious.

> The only thing that I could think of is that I created hourly snapshots
> with snapper.
> In fact in order to be able to do the balance, I had to delete something
> -so I deleted the snapshots.

One possibility off the top of my head:  Do you have noatime set in your 
mount options?  That's definitely recommended with snapshotting, since 
otherwise, atime updates will be changes to the filesystem metadata since 
the last snapshot, and thus will add to the difference between snapshots 
that must be stored.  If you're doing hourly snapshots and are accessing 
much of the filesystem each hour, that'll add up!

Additionally, I recommend snapshot thinning.  Hourly snapshots are nice 
but after some time, they just become noise.  Will you really know or 
care which specific hour it was if you're having to retrieve a snapshot 
from a month ago?

So hourly snapshots, but after say a day, delete two out of three, 
leaving three-hourly snapshots.  After two days, delete another half, 
leaving six-hourly snapshots (four a day).  After a week, delete three of 
the four, leaving daily snapshots.  After a quarter (13 weeks) delete six 
of seven (or 4 of five if it's weekdays only), leaving weekly snapshots.  
After a year, delete 12 of the 13, leaving quarterly snapshots.  ...  Or 
something like that.  You get the idea.  Obviously script it, just like 
the snapshotting itself is scripted.

That will solve another problem too.  When btrfs gets into the thousands 
of snapshots, at it will pretty fast with unthinned hourly, certain 
operations slow down dramatically.  The problem was much worse at one 
point, but the snapshot aware defrag was disabled for the time being, as 
it simply didn't scale and people with thousands of snapshots were seeing 
balances or defrags go days with little visible progress.  But, few 
people really /need/ thousands of snapshots.  With a bit of reasonable 
thinning down to one a quarter, you end up with 200-300 snapshots and 
that's it.

Also, it may or may not apply to you, but internal-rewrite (as opposed to 
simply appended) files are bad news for COW-based filesystems such as 
btrfs.  The autodefrag mount option can help with this for smaller files 
(say to several hundred megabytes in size), but for larger (from say half 
a gig) actively rewritten files such as databases, VM images, and pre-
allocated torrent downloads until they're fully downloaded, setting the 
NOCOW attribute (chattr +C, change in-place, instead of using the normal 
copy-on-write) is strongly recommended.  But the catch is that the 
attribute needs to be set while the file is still zero-size, before it 
actually has any content.  The easiest way to do that is to create a 
dedicated directory for such files and to set the attribute on the 
directory, after which it'll automatically be inherited by any newly 
created files or subdirs in that directory.

But, there's a catch with snapshots.  The first change to a block after a 
snapshot forces a COW anyway, since the data has changed from that of the 
snapshot.  So for those making heavy use of snapshots, creating dedicated 
subvolumes for these NOCOW directories is a good idea, since snapshots 
are per subvolume and thus these dedicated subvolumes will be excluded 
from the general snapshots (just don't snapshot the dedicated subvolumes).

Of course that does limit the value of snapshots to some degree, but it's 
worth keeping in mind that most filesystems don't even offer the snapshot 
feature at all, so...

> Can you tell me where I can read about the causes for this problem?

The above wisdom is mostly from reading the list for awhile.  Like I 
said, it takes awhile to soak in, and my thinking on the subject has 
changed somewhat over time.  The fact that NOCOW wasn't NOCOW on the 
first change after a snapshot was a rather big epiphany to me, but AFAIK, 
that's not on the wiki or elsewhere yet.  It makes sense if you think 
about it, but someone specifically asked, and the devs confirmed it.  
Before that I had no idea, and was left wondering at some of the behavior 
being reported, even with nocow properly set.  (That was back when the 
broken snapshot aware defrag was still in place, as it simply didn't 
scale with snapshots and such files, and I couldn't figure out why NOCOW 
wasn't working to avoid the problem, until a dev confirmed that the first 
change after a snapshot was COW anyway, and it all dropped into place... 
continuously rewritten VM images, even if set NOCOW, would still be 
continuously fragmented, if people were doing regular snapshots on them.)

> Besides this:
> You recommend monitoring the output of btrfs fi show and to do a
> balance, whenever unallocated space drops too low. I can monitor this
> and let monit send me a message once that happens. Still, I'd like to
> know how to make this less likely.

I haven't had a problem with it here, but then I haven't been doing much 
snapshotting (and always manual when I do it), I don't run any VMs or 
large databases, I mounted with the autodefrag option from the beginning, 
and I've used noatime for nearing a decade now as it was also recommended 
for my previous filesystem, reiserfs.

But regardless of my experience with my own usage pattern, I suspect that 
with reasonable monitoring, you'll eventually become familiar with how 
fast the chunks are allocated and possibly with what sort of actions 
beyond the obvious active moving stuff around on the filesystem triggers 
those allocations, for your specific usage pattern, and can then adapt as 
necessary.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2014-03-22 23:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <532DF38B.40409@friedels.name>
2014-03-22 21:16 ` free space inode generation (0) did not match free space cache generation Hendrik Friedel
2014-03-22 23:32   ` Duncan [this message]
2014-03-24 20:52     ` Hendrik Friedel
2014-03-25 13:00       ` Duncan
2014-03-25 20:03         ` Hendrik Friedel
2014-03-25 20:10           ` Hugo Mills
2014-03-25 21:28             ` Duncan
2014-03-25 21:50               ` Hugo Mills
2014-03-28  7:32             ` Hendrik Friedel
2014-03-22 18:13 Hendrik Friedel
2014-03-22 19:23 ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$b7ff$285d59f6$2419371$b3946ebe@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox