linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
Date: Sun, 28 May 2017 04:55:21 +0000 (UTC)	[thread overview]
Message-ID: <pan$639c2$50c928c5$2b2a8c90$6b4270a1@cox.net> (raw)
In-Reply-To: CADzmB23zKnH35mW8jVPW3Z552t7Kw3wWhRn+x0jxtT3tCcCS9Q@mail.gmail.com

Ivan P posted on Sat, 27 May 2017 22:54:31 +0200 as excerpted:

>>>>> Please add me to CC when replying, as I am not
>>>>> subscribed to the mailing list.

> Hmm, remounting as you suggested has shut it up immediately - hurray!
> 
> I don't really have any special write pattern from what I can tell.
> About the only thing different from all the other btrfs systems I've set
> up is that the data is also on the same volume as the system. Normal
> usage, no VMs or heavy file generation. I'm also only taking snapshots
> of the system and @home, with the latter only containing my .config,
> .cache and symlinks to some folders in @data.

Systemd?  Journald with journals on btrfs?  Regularly snapshotting that 
subvolume?

If yes to all of the above, that might be the issue.  Normally systemd 
will set the journal directory NOCOW, so the journal files inherit it at 
creation, in ordered to avoid heavy fragmentation due to the COW-
unfriendly database-style file-internal-rewrite pattern with the journal 
files.  

Great.  Except that snapshotting locks the existing version of the file 
in place with the snapshot, so the next write to any block must be COW 
anyway.  This is sometimes referred to as COW1, since it's a single-time 
COW, and the effect isn't too bad with a one-time snapshot.  But if 
you're regularly snapshotting the journal files, that will trigger COW1 
on every snapshot, which if you're snapshotting often enough can be 
almost as bad as regular COW in terms of fragmentation.

The fix is to make the journal dir a subvolume instead, thereby excluding 
it from the snapshot taken on the parent subvolume, and just don't 
snapshot the journal subvolume then, so the NOCOW that systemd should 
already set on that subdir and its contents will actually be NOCOW, 
without interference from snapshotting repeatedly forcing COW1.


Of course an alternative fix, the one I use here (and am happy with) 
instead, is to have a normal syslog (I use syslog-ng, but others have 
reported using rsyslog) handling your saved logs in traditional text form 
(most modern syslogs should cooperate with systemd's journald), and 
configure journald to only use tmpfs (see the journald.conf manpage).  
Traditional text logs are append-only and not nearly as bad in COW 
terms.  Meanwhile, journald is still active, just writing to tmpfs only, 
so you get a journal for the current boot session and thus can still take 
advantage of all the usual systemd/journald features such as systemctl 
status spitting out the last 10 log entries for that service, etc.  It's 
just limited to the current boot session, and you use the normal text 
logs for anything older than that.  For me anyway that's the best of both 
worlds, and I don't have to worry about how the journal files behave on 
btrfs at all, because they're not written to btrfs at all. =:^)


Meanwhile, since you mentioned snapshots, a word of caution there.  If 
you do have scripted snapshots being taken, be sure you have a script 
thinning down your snapshot history as well.  More than 200-300 snapshots 
per subvolume scales very poorly in btrfs maintenance terms (and qgroups 
make the problem far worse, if you have them active at all).  But if for 
instance you're taking snapshots ever hour, if you need something from 
one say a month old, are you really going to remember or care which exact 
hour it was, or will the daily either before or after that hour be fine, 
and actually much easier to find if you've trimmed to daily by then, as 
opposed to having hundreds and hundreds of hourly snapshots accumulating?

So snapshots are great but they don't come without cost, and if you keep 
under 200 and if possible under 100 per subvolume, you'll find 
maintenance such as balance and check (fsck) go much faster than they do 
with even 500, let alone thousands.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  reply	other threads:[~2017-05-28  4:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-27 18:53 btrfs-tools/linux 4.11: btrfs-cleaner misbehaving Ivan P
2017-05-27 19:33 ` Hans van Kranenburg
2017-05-27 20:29   ` Ivan P
2017-05-27 20:42     ` Hans van Kranenburg
2017-05-27 20:54       ` Ivan P
2017-05-28  4:55         ` Duncan [this message]
2017-05-28  7:13           ` Marat Khalili
2017-05-27 19:36 ` Jean-Denis Girard
     [not found] <20170527215608.23a40176@ws>
2017-05-28 10:39 ` Ivan P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$639c2$50c928c5$2b2a8c90$6b4270a1@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).