* Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
@ 2015-02-19 14:30 Konstantinos Skarlatos
2015-02-19 17:51 ` Chris Murphy
2015-02-19 22:57 ` Duncan
0 siblings, 2 replies; 4+ messages in thread
From: Konstantinos Skarlatos @ 2015-02-19 14:30 UTC (permalink / raw)
To: btr >> linux-btrfs@vger.kernel.org; +Cc: lennart Poettering
Systemd 219 now sets the special FS_NOCOW file flag for its journal
files[1]. This unfortunately breaks the ability to repair the journal on
RAID 1/5/6 btrfs volumes, should a bad sector happen to appear there. Is
this something that can be configured for systemd? Is btrfs going to
someday fix the fragmentation problem, making this option reduntant?
[1]
http://lists.freedesktop.org/archives/systemd-devel/2015-February/028447.html
* journald now sets the special FS_NOCOW file flag for its
journal files. This should improve performance on btrfs, by
avoiding heavy fragmentation when journald's write-pattern
is used on COW file systems. It degrades btrfs' data
integrity guarantees for the files to the same levels as for
ext3/ext4 however. This should be OK though as journald does
its own data integrity checks and all its objects are
checksummed on disk. Also, journald should handle btrfs disk
full events a lot more gracefully now, by processing SIGBUS
errors, and not relying on fallocate() anymore.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
2015-02-19 14:30 Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs Konstantinos Skarlatos
@ 2015-02-19 17:51 ` Chris Murphy
2015-02-19 21:23 ` Duncan
2015-02-19 22:57 ` Duncan
1 sibling, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2015-02-19 17:51 UTC (permalink / raw)
To: Btrfs BTRFS, Lennart Poettering
On Thu, Feb 19, 2015 at 7:30 AM, Konstantinos Skarlatos
<k.skarlatos@gmail.com> wrote:
> Systemd 219 now sets the special FS_NOCOW file flag for its journal
> files[1]. This unfortunately breaks the ability to repair the journal on
> RAID 1/5/6 btrfs volumes, should a bad sector happen to appear there. Is
> this something that can be configured for systemd? Is btrfs going to someday
> fix the fragmentation problem, making this option reduntant?
Chris is looking at a per file autodefrag setting, last I read. I
think that's a better way forward. I'm finding that +C on journals is
an OK short term workaround; the problem is that if the containing
subvolume is subject to snapshots, in effect the +C benefit is
thwarted. I have a ~2 week old journal subject to merely 6 snapshots
in that time, and it has over 9000 extents despite +C.
I think autodefragging journals on HDD is OK, but I'm uncertain if
this is really necessary on SSD.
I'm not sure how to make this better without adding complexity. I just
had an idea of a journal specific partition which kinda decouples the
dependency on filesystems being mounted so the journal can
persistently write to stable media earlier in boot and later in
shutdown, and some other things are easier for journald also. But it
might make some people scream.
--
Chris Murphy
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
2015-02-19 17:51 ` Chris Murphy
@ 2015-02-19 21:23 ` Duncan
0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2015-02-19 21:23 UTC (permalink / raw)
To: linux-btrfs
Chris Murphy posted on Thu, 19 Feb 2015 10:51:57 -0700 as excerpted:
> Chris is looking at a per file autodefrag setting,
Just to be clear, that's Chris _Mason_, not a third-person reference by
Chris _Murphy_ to himself... =:^)
People who know that Chris _Mason_ is btrfs lead dev won't have been
confused, but the above "Chris" reference could certainly be confusing to
those who don't/didn't. =:^\
I've struggled with this myself. Significantly, not even the customary
informal disambiguation, first name, last initial, "Chris M",
disambiguates things here. As a result, I've been trying to get myself
in the habit of always using full first and last, for both people, even
in contexts where it's a bit awkward.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
2015-02-19 14:30 Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs Konstantinos Skarlatos
2015-02-19 17:51 ` Chris Murphy
@ 2015-02-19 22:57 ` Duncan
1 sibling, 0 replies; 4+ messages in thread
From: Duncan @ 2015-02-19 22:57 UTC (permalink / raw)
To: linux-btrfs
Konstantinos Skarlatos posted on Thu, 19 Feb 2015 16:30:37 +0200 as
excerpted:
> Systemd 219 now sets the special FS_NOCOW file flag for its journal
> files[1]. This unfortunately breaks the ability to repair the journal on
> RAID 1/5/6 btrfs volumes, should a bad sector happen to appear there. Is
> this something that can be configured for systemd?
IIRC I suggested that it be configurable, but default to nocow, in the
original discussion here, some months ago. But while I saw the 219
announcement, I've not upgraded to it yet, to know whether that
suggestion was implemented or not.
What I can say with some confidence is that if it's implemented, it
should be quite well documented, as systemd gets VERY high marks here for
generally high configurability and even better consistency in documenting
that configurability, almost to an extreme, but that's the way I like it
and the one thing that made the switch to systemd as easy as it was here,
since I'm a gentooer, with the famous gentooer like of extreme
customization, such that I'd have been unlikely to switch and griping all
the way if I did, without that documentation. =:^)
Try the journald.conf manpage. If the option exists, journald.conf is
almost certainly where it's located, and thus that manpage where it'll
almost certainly be documented. If there's nothing about it there, then
I'd put chances at better than 98% that there's no such option, beyond
patching one in yourself, of course, systemd being freedomware as it is.
=:^)
Meanwhile...
Insert customary gripe about binary-format log-files here...
What I've done here (and posted before, with someone else saying it was a
good idea he'd try, so I'm posting a bit more detail this time, hopefully
saving some duplicated trial and error to arrive at workable settings) is
this:
1) Journald is configured for single-session mode, journaling to tmpfs
only.
This give me the best of the journald benefits, getting the last few log
entries for a service when I run a systemctl status on it, etc, without
having to hassle the binary journal format on permanent storage.
2) My previous syslog service (syslog-ng) remains installed, writing its
usual text-based logs to permanent storage in traditional log append-only
fashion. That works as it always has, with my syslog-ng config for
sorting messages to various logfiles after dumping any "noise" straight
to /dev/null.
Additional notes:
3) Logrotate is still used, now with a systemd timer job instead of the
old cron job (I unmerged cron after switching everything to systemd
timers), to handle log rotation.
4) Noise: Unfortunately I've not found a way to tell journald to simply
dump certain "noise" messages to /dev/null, without tracking them or
writing them anywhere else. There's a reasonable variety of options to
filter message display, but they're all post-storage (tmpfs and if
configured, permanent), so such noise continues to bulk up the journals
unnecessarily, even when it really *IS* noise that I don't want tracked
or stored AT ALL. An example would be a job I have that executes an sudo
every few seconds, multiple times per minute. There's a way to tell sudo
not to log it, but pam still logs it and I never found a way to tell it
not to. Back when I was running only syslog-ng, it was easy enough to
setup a filter that dumped that "noise" to /dev/null and never actually
logged it anywhere, so the logs didn't quickly fill up with this stupid
pam noise. Unfortunately, I've not figured out a way to tell journald
the same thing, so it still tracks all that pam noise. I can filter the
post-journal output, but it's still in the journal; no way I can see to
kill that.
Fortunately, however, the journal's binary format is compressed, and it
apparently compresses that down to one instance, with all the rest being
effectively a compressed timestamp and a reference back to the first one
for the other tracked details. So that doesn't affect the tmpfs size of
the journals so badly, and the writes to permanent storage don't happen
since journald has that turned off, and systemd-ng has it filtered out
*before* it gets logged.
5) FWIW, notes on my config:
5a) journald.conf non-default settings:
ForwardToSyslog=yes
RuntimeKeepFree=48M
RuntimeMaxFileSize=64M
RuntimeMaxUse=448M
Storage=volatile
TTYPath=/dev/tty12
5b) syslog-ng, as I guess most sysloggers, now has a build-time configure
option for systemd, which links it to the journald libs and eases
cooperation between journald and the traditional syslogger. On gentoo
that's controlled by the systemd USE flag. Naturally I have it on (as
will most systemd-based binary distros, for their logger packages, I
imagine). I believe something similar could be configured without that,
but it'd be much more difficult and could result in some stuff appearing
in syslog twice, while other stuff might get missed.
5c) The tmpfs I had mounted on /run, which is where journald stores its
volatile journals, was originally too small, as all it used to handle was
a few tiny pid files and some zero-size lock files. With journald
logging a full sometimes several day boot session to tmpfs-only, I had to
increase the maximum tmpfs size for /run. It's now 512 MiB (half a gig),
still small enough not to have to worry about too much on a 16 GiB memory
system, but large enough to handle several days worth of journals,
including the "noise" I griped about in #4. But that's the reason behind
the RuntimeMaxUse and RuntimeKeepFree options above, as with that strict
a tmpfs limit /designed/ to have journald use /most/ of it, the journald's
default size settings, designed to shut down logging when space gets too
tight, shut it down when there's still plenty of space left on the tmpfs.
Obviously you can adjust the tmpfs size and Runtime* settings in
journald.conf as necessary for your system.
5d) I mentioned using systemd timers instead of cron to trigger
logrotate. That's out of scope for this post, but if someone's
interested enough to request them, I can post them...
I've been happy with this setup and believe it gives me the best of both
worlds, journald's nice log output in service status reports for the
current session, only what I want logged to permanent storage, in my
preferred text format and sorted in my preferred way into various files,
both for this session and previous sessions, until I logrotate the logs
out, just as I was doing before systemd.
And I don't have to worry about systemd's binary journal format on btrfs!
=:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-02-19 22:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-19 14:30 Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs Konstantinos Skarlatos
2015-02-19 17:51 ` Chris Murphy
2015-02-19 21:23 ` Duncan
2015-02-19 22:57 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).