From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Cc: systemd-devel@lists.freedesktop.org
Subject: Re: Slow startup of systemd-journal on BTRFS
Date: Thu, 12 Jun 2014 04:39:00 +0000 (UTC) [thread overview]
Message-ID: <pan$114e$cb58c121$34286c85$f093cbbf@cox.net> (raw)
In-Reply-To: 1751055.d79eDl77JG@xev
Russell Coker posted on Thu, 12 Jun 2014 11:18:37 +1000 as excerpted:
> On Wed, 11 Jun 2014 23:28:54 Goffredo Baroncelli wrote:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1006386
>>
>> suggested me that the problem could be due to a bad interaction between
>> systemd and btrfs. NetworkManager was innocent. It seems that
>> systemd-journal create a very hight fragmented files when it stores its
>> log. And BTRFS it is know to behave slowly when a file is highly
>> fragmented. This had caused a slow startup of systemd-journal, which in
>> turn had blocked the services which depend by the loggin system.
>
> On my BTRFS/systemd systems I edit /etc/systemd/journald.conf and put
> "SystemMaxUse=50M". That doesn't solve the fragmentation problem but
> reduces it enough that it doesn't bother me.
FWIW, as a relatively new switcher to systemd, that is, after switching
to btrfs only a year or so ago... Two comments:
1) Having seen a few reports of journald's journal fragmentation on this
list, I was worried about those journals here as well.
My solution to both this problem and to an unrelated frustration with
journald[1] was to:
a) confine journald to only a volatile (memory-only) log, first on a
temporary basis while I was only experimenting with and setting up systemd
(using the kernel command-line's init= to point at systemd while /sbin/
init still pointed at sysv's init for openrc), then later permanently,
once I got enough of my systemd and journald config setup to actually
switch to it.
b) configure my former syslogger (syslog-ng, in my case) to continue in
that role under systemd, with journald relaying to it for non-volatile
logging.
Here's the /etc/journald.conf changes I ended up with to accomplish (a),
see the journald.conf(5) manpage for the documentation, as well as the
below explanation:
Storage=volatile
RuntimeMaxUse=448M
RuntimeKeepFree=48M
RuntimeMaxFileSize=64M
Storage=volatile is the important one. As the manpage notes, that means
journald stores files under /run/log/journal only, where /run is normally
setup by systemd as a tmpfs mount, so these files are tmpfs and thus
memory-only.
The other three must be read in the context of a 512 MiB /run on tmpfs
[2]. From that and the information in the journald.conf manpage, it
should be possible to see that my setup is (runtime* settings apply to
the volatile files under /run):
An individual journal filesize (MaxFileSize) of 64 MiB, with seven such
files in rotation (the default if MaxFileSize is unset is eight),
totaling 448 MiB (MaxUse, the default being 10% of the filesystem, too
small here since the journals are basically the only thing taking
space). On a 512 MiB filesystem, that will leave 64 MiB for other uses
(pretty much all 0-byte lock and pidfiles, IIRC I was running something
like a 2 MiB /run before systemd without issue).
It's worth noting that UNLIKE MaxUse, which will trigger journal file
rotation when hit, hitting the KeepFree forces journald to stop
journaling entirely -- *NOT* just to stop writing them here, but to stop
forwarding to syslog (syslog-ng here) as well. I FOUND THIS OUT THE HARD
WAY! Thus, in ordered to keep journald still functional, make sure
journald runs into the MaxUse limit before it runs into KeepFree. The
KeepFree default is 15% of the filesystem, just under 77 MiB on a 512 MiB
filesystem which is why I found this out the hard way with settings that
would otherwise keep only 64 MiB free. The 48 MiB setting I chose leaves
16 MiB of room for other files before journald shuts down journaling,
which should be plenty, since under normal circumstances the other files
should all be 0-byte lock and pidfiles. Just in case, however, there's
still 48 MiB of room for other files after journald shuts down, before
the filesystem itself fills up.
Configuring the syslogger to work with journald is "left as an exercise
for the reader", as they say, since for all I know the OP is using
something other than the syslog-ng I'm familiar with anyway. But if
hints for syslog-ng are needed too, let me know. =:^)
2) Someone else mentioned btrfs' autodefrag mount-option. Given #1 above
I've obviously not had a lot of experience with journald logs and
autodefrag, but based on all I know about btrfs fragmentation behavior as
well as journald journal file behavior from this list, as long as
journald's non-volatile files are kept significantly under 1 GiB and
preferably under half a GiB each, it shouldn't be a problem, with a
/possible/ exception if you get something run-away-journaling multiple
messages a second for a reasonably long period, such that the I/O can't
keep up with both the journaling and autodefrag.
If you do choose to keep a persistent journal with autodefrag, then, I'd
recommend journald.conf settings that keep individual journal files to
perhaps 128 MiB each. (System* settings apply to the non-volatile files
under /var, in /var/log/journal/.)
SystemMaxFileSize=128M
AFAIK, that wouldn't affect the total journal size and thus the number of
journal files, which would remain 10% of the filesystem size by default.
Alternatively, given the default 8-file rotation if MaxFileSize is unset,
you could limit the total journal size to 1 GiB, for the same 128 MiB
individual file size.
SystemMaxUse=1G
Of course if you want/need more control set both and/or other settings as
I did for my volatile-only configuration above.
---
[1] Unrelated journald frustration: Being a syslog-ng user I've been
accustomed to being able to pre-filter incoming messages *BEFORE* they
get written to the files. This ability has been the important bit of my
run-away-log coping strategy when I have no direct way to reconfigure the
source to reduce the rate at which it's spitting out "noise" that's
otherwise overwhelming my logs.
Unfortunately, while it seems journald has all /sorts/ of file-grep-style
filtering tools to focus in like a laser on what you want to see AFTER
the journal is written, I found absolutely NO documentation on setting up
PREWRITE journal filters (except log-level, and there's global rate-
limiting as well, but I didn't want to use those, I wanted to filter
specific "noise" messages), which means runaway journaling simply runs
away, and if I use the size restriction stuff to turn it down, I quickly
lose the important stuff I want to keep around when the files size-rotate
due to that runaway!
Thus my solution, keep journald storage volatile only, relatively small
but still big enough I can use the great systemctl status integration to
get the last few journal entries from each service, then feed it to
syslog-ng to pre-write filter the noise out before actual write to
permanent storage. =:^)
[2] 512 MiB /run tmpfs. This is on a 16 GiB RAM system, so default tmpfs
size would be 8 GiB. But I have several such tmpfs including a big /tmp
that I use for scratch space when I'm building stuff (gentoo's
PORTAGE_TMPDIR, and for fully in-memory DVD ISOs too), and I don't have
swap configured at all, so keeping a reasonable lid on things by
limiting /run and its major journal-file space user to half a GiB seems
prudent.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2014-06-12 4:39 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-11 21:28 Slow startup of systemd-journal on BTRFS Goffredo Baroncelli
2014-06-12 0:40 ` Chris Murphy
2014-06-12 1:18 ` Russell Coker
2014-06-12 4:39 ` Duncan [this message]
2014-06-12 1:21 ` Dave Chinner
2014-06-12 1:37 ` Dave Chinner
2014-06-12 2:32 ` Chris Murphy
2014-06-15 22:34 ` [systemd-devel] " Lennart Poettering
2014-06-16 4:01 ` Chris Murphy
2014-06-16 4:38 ` cwillu
-- strict thread matches above, loose matches on Subject: below --
2014-06-12 11:13 R: " Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 12:37 ` Duncan
2014-06-12 23:24 ` Dave Chinner
2014-06-13 22:19 ` Goffredo Baroncelli
2014-06-14 2:53 ` Duncan
2014-06-14 7:52 ` Goffredo Baroncelli
2014-06-15 5:43 ` Duncan
2014-06-14 10:59 ` Kai Krakow
2014-06-15 5:02 ` Duncan
2014-06-15 11:18 ` Kai Krakow
2014-06-15 21:45 ` Martin Steigerwald
2014-06-15 21:51 ` Hugo Mills
2014-06-15 21:31 ` Martin Steigerwald
2014-06-15 21:37 ` Hugo Mills
2014-06-17 8:22 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$114e$cb58c121$34286c85$f093cbbf@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=systemd-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).