From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
Date: Sun, 28 May 2017 04:55:21 +0000 (UTC) [thread overview]
Message-ID: <pan$639c2$50c928c5$2b2a8c90$6b4270a1@cox.net> (raw)
In-Reply-To: CADzmB23zKnH35mW8jVPW3Z552t7Kw3wWhRn+x0jxtT3tCcCS9Q@mail.gmail.com
Ivan P posted on Sat, 27 May 2017 22:54:31 +0200 as excerpted:
>>>>> Please add me to CC when replying, as I am not
>>>>> subscribed to the mailing list.
> Hmm, remounting as you suggested has shut it up immediately - hurray!
>
> I don't really have any special write pattern from what I can tell.
> About the only thing different from all the other btrfs systems I've set
> up is that the data is also on the same volume as the system. Normal
> usage, no VMs or heavy file generation. I'm also only taking snapshots
> of the system and @home, with the latter only containing my .config,
> .cache and symlinks to some folders in @data.
Systemd? Journald with journals on btrfs? Regularly snapshotting that
subvolume?
If yes to all of the above, that might be the issue. Normally systemd
will set the journal directory NOCOW, so the journal files inherit it at
creation, in ordered to avoid heavy fragmentation due to the COW-
unfriendly database-style file-internal-rewrite pattern with the journal
files.
Great. Except that snapshotting locks the existing version of the file
in place with the snapshot, so the next write to any block must be COW
anyway. This is sometimes referred to as COW1, since it's a single-time
COW, and the effect isn't too bad with a one-time snapshot. But if
you're regularly snapshotting the journal files, that will trigger COW1
on every snapshot, which if you're snapshotting often enough can be
almost as bad as regular COW in terms of fragmentation.
The fix is to make the journal dir a subvolume instead, thereby excluding
it from the snapshot taken on the parent subvolume, and just don't
snapshot the journal subvolume then, so the NOCOW that systemd should
already set on that subdir and its contents will actually be NOCOW,
without interference from snapshotting repeatedly forcing COW1.
Of course an alternative fix, the one I use here (and am happy with)
instead, is to have a normal syslog (I use syslog-ng, but others have
reported using rsyslog) handling your saved logs in traditional text form
(most modern syslogs should cooperate with systemd's journald), and
configure journald to only use tmpfs (see the journald.conf manpage).
Traditional text logs are append-only and not nearly as bad in COW
terms. Meanwhile, journald is still active, just writing to tmpfs only,
so you get a journal for the current boot session and thus can still take
advantage of all the usual systemd/journald features such as systemctl
status spitting out the last 10 log entries for that service, etc. It's
just limited to the current boot session, and you use the normal text
logs for anything older than that. For me anyway that's the best of both
worlds, and I don't have to worry about how the journal files behave on
btrfs at all, because they're not written to btrfs at all. =:^)
Meanwhile, since you mentioned snapshots, a word of caution there. If
you do have scripted snapshots being taken, be sure you have a script
thinning down your snapshot history as well. More than 200-300 snapshots
per subvolume scales very poorly in btrfs maintenance terms (and qgroups
make the problem far worse, if you have them active at all). But if for
instance you're taking snapshots ever hour, if you need something from
one say a month old, are you really going to remember or care which exact
hour it was, or will the daily either before or after that hour be fine,
and actually much easier to find if you've trimmed to daily by then, as
opposed to having hundreds and hundreds of hourly snapshots accumulating?
So snapshots are great but they don't come without cost, and if you keep
under 200 and if possible under 100 per subvolume, you'll find
maintenance such as balance and check (fsck) go much faster than they do
with even 500, let alone thousands.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2017-05-28 4:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-27 18:53 btrfs-tools/linux 4.11: btrfs-cleaner misbehaving Ivan P
2017-05-27 19:33 ` Hans van Kranenburg
2017-05-27 20:29 ` Ivan P
2017-05-27 20:42 ` Hans van Kranenburg
2017-05-27 20:54 ` Ivan P
2017-05-28 4:55 ` Duncan [this message]
2017-05-28 7:13 ` Marat Khalili
2017-05-27 19:36 ` Jean-Denis Girard
[not found] <20170527215608.23a40176@ws>
2017-05-28 10:39 ` Ivan P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$639c2$50c928c5$2b2a8c90$6b4270a1@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).