linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
@ 2015-02-19 14:30 Konstantinos Skarlatos
  2015-02-19 17:51 ` Chris Murphy
  2015-02-19 22:57 ` Duncan
  0 siblings, 2 replies; 4+ messages in thread
From: Konstantinos Skarlatos @ 2015-02-19 14:30 UTC (permalink / raw)
  To: btr >> linux-btrfs@vger.kernel.org; +Cc: lennart Poettering

Systemd 219 now sets the special FS_NOCOW file flag for its journal 
files[1]. This unfortunately breaks the ability to repair the journal on 
RAID 1/5/6 btrfs volumes, should a bad sector happen to appear there. Is 
this something that can be configured for systemd? Is btrfs going to 
someday fix the fragmentation problem, making this option reduntant?


[1] 
http://lists.freedesktop.org/archives/systemd-devel/2015-February/028447.html

         * journald now sets the special FS_NOCOW file flag for its
           journal files. This should improve performance on btrfs, by
           avoiding heavy fragmentation when journald's write-pattern
           is used on COW file systems. It degrades btrfs' data
           integrity guarantees for the files to the same levels as for
           ext3/ext4 however. This should be OK though as journald does
           its own data integrity checks and all its objects are
           checksummed on disk. Also, journald should handle btrfs disk
           full events a lot more gracefully now, by processing SIGBUS
           errors, and not relying on fallocate() anymore.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
  2015-02-19 14:30 Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs Konstantinos Skarlatos
@ 2015-02-19 17:51 ` Chris Murphy
  2015-02-19 21:23   ` Duncan
  2015-02-19 22:57 ` Duncan
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2015-02-19 17:51 UTC (permalink / raw)
  To: Btrfs BTRFS, Lennart Poettering

On Thu, Feb 19, 2015 at 7:30 AM, Konstantinos Skarlatos
<k.skarlatos@gmail.com> wrote:
> Systemd 219 now sets the special FS_NOCOW file flag for its journal
> files[1]. This unfortunately breaks the ability to repair the journal on
> RAID 1/5/6 btrfs volumes, should a bad sector happen to appear there. Is
> this something that can be configured for systemd? Is btrfs going to someday
> fix the fragmentation problem, making this option reduntant?

Chris is looking at a per file autodefrag setting, last I read. I
think that's a better way forward. I'm finding that +C on journals is
an OK short term workaround; the problem is that if the containing
subvolume is subject to snapshots, in effect the +C benefit is
thwarted. I have a ~2 week old journal subject to merely 6 snapshots
in that time, and it has over 9000 extents despite +C.

I think autodefragging journals on HDD is OK, but I'm uncertain if
this is really necessary on SSD.

I'm not sure how to make this better without adding complexity. I just
had an idea of a journal specific partition which kinda decouples the
dependency on filesystems being mounted so the journal can
persistently write to stable media earlier in boot and later in
shutdown, and some other things are easier for journald also. But it
might make some people scream.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
  2015-02-19 17:51 ` Chris Murphy
@ 2015-02-19 21:23   ` Duncan
  0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2015-02-19 21:23 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Thu, 19 Feb 2015 10:51:57 -0700 as excerpted:

> Chris is looking at a per file autodefrag setting,

Just to be clear, that's Chris _Mason_, not a third-person reference by 
Chris _Murphy_ to himself... =:^)

People who know that Chris _Mason_ is btrfs lead dev won't have been 
confused, but the above "Chris" reference could certainly be confusing to 
those who don't/didn't. =:^\

I've struggled with this myself.  Significantly, not even the customary 
informal disambiguation, first name, last initial, "Chris M", 
disambiguates things here.  As a result, I've been trying to get myself 
in the habit of always using full first and last, for both people, even 
in contexts where it's a bit awkward.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
  2015-02-19 14:30 Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs Konstantinos Skarlatos
  2015-02-19 17:51 ` Chris Murphy
@ 2015-02-19 22:57 ` Duncan
  1 sibling, 0 replies; 4+ messages in thread
From: Duncan @ 2015-02-19 22:57 UTC (permalink / raw)
  To: linux-btrfs

Konstantinos Skarlatos posted on Thu, 19 Feb 2015 16:30:37 +0200 as
excerpted:

> Systemd 219 now sets the special FS_NOCOW file flag for its journal
> files[1]. This unfortunately breaks the ability to repair the journal on
> RAID 1/5/6 btrfs volumes, should a bad sector happen to appear there. Is
> this something that can be configured for systemd?

IIRC I suggested that it be configurable, but default to nocow, in the 
original discussion here, some months ago.  But while I saw the 219 
announcement, I've not upgraded to it yet, to know whether that 
suggestion was implemented or not.

What I can say with some confidence is that if it's implemented, it 
should be quite well documented, as systemd gets VERY high marks here for 
generally high configurability and even better consistency in documenting 
that configurability, almost to an extreme, but that's the way I like it 
and the one thing that made the switch to systemd as easy as it was here, 
since I'm a gentooer, with the famous gentooer like of extreme 
customization, such that I'd have been unlikely to switch and griping all 
the way if I did, without that documentation. =:^)

Try the journald.conf manpage.  If the option exists, journald.conf is 
almost certainly where it's located, and thus that manpage where it'll 
almost certainly be documented.  If there's nothing about it there, then 
I'd put chances at better than 98% that there's no such option, beyond 
patching one in yourself, of course, systemd being freedomware as it is. 
=:^)

Meanwhile...

Insert customary gripe about binary-format log-files here...

What I've done here (and posted before, with someone else saying it was a 
good idea he'd try, so I'm posting a bit more detail this time, hopefully 
saving some duplicated trial and error to arrive at workable settings) is 
this:

1) Journald is configured for single-session mode, journaling to tmpfs 
only.

This give me the best of the journald benefits, getting the last few log 
entries for a service when I run a systemctl status on it, etc, without 
having to hassle the binary journal format on permanent storage.

2) My previous syslog service (syslog-ng) remains installed, writing its 
usual text-based logs to permanent storage in traditional log append-only 
fashion.  That works as it always has, with my syslog-ng config for 
sorting messages to various logfiles after dumping any "noise" straight 
to /dev/null.

Additional notes:

3) Logrotate is still used, now with a systemd timer job instead of the 
old cron job (I unmerged cron after switching everything to systemd 
timers), to handle log rotation.

4) Noise:  Unfortunately I've not found a way to tell journald to simply 
dump certain "noise" messages to /dev/null, without tracking them or 
writing them anywhere else.  There's a reasonable variety of options to 
filter message display, but they're all post-storage (tmpfs and if 
configured, permanent), so such noise continues to bulk up the journals 
unnecessarily, even when it really *IS* noise that I don't want tracked 
or stored AT ALL.  An example would be a job I have that executes an sudo 
every few seconds, multiple times per minute.  There's a way to tell sudo 
not to log it, but pam still logs it and I never found a way to tell it 
not to.  Back when I was running only syslog-ng, it was easy enough to 
setup a filter that dumped that "noise" to /dev/null and never actually 
logged it anywhere, so the logs didn't quickly fill up with this stupid 
pam noise.  Unfortunately, I've not figured out a way to tell journald 
the same thing, so it still tracks all that pam noise.  I can filter the 
post-journal output, but it's still in the journal; no way I can see to 
kill that.

Fortunately, however, the journal's binary format is compressed, and it 
apparently compresses that down to one instance, with all the rest being 
effectively a compressed timestamp and a reference back to the first one 
for the other tracked details.  So that doesn't affect the tmpfs size of 
the journals so badly, and the writes to permanent storage don't happen 
since journald has that turned off, and systemd-ng has it filtered out 
*before* it gets logged.

5) FWIW, notes on my config:

5a) journald.conf non-default settings:

ForwardToSyslog=yes
RuntimeKeepFree=48M
RuntimeMaxFileSize=64M
RuntimeMaxUse=448M
Storage=volatile
TTYPath=/dev/tty12

5b) syslog-ng, as I guess most sysloggers, now has a build-time configure 
option for systemd, which links it to the journald libs and eases 
cooperation between journald and the traditional syslogger.  On gentoo 
that's controlled by the systemd USE flag.  Naturally I have it on (as 
will most systemd-based binary distros, for their logger packages, I 
imagine).  I believe something similar could be configured without that, 
but it'd be much more difficult and could result in some stuff appearing 
in syslog twice, while other stuff might get missed.

5c) The tmpfs I had mounted on /run, which is where journald stores its 
volatile journals, was originally too small, as all it used to handle was 
a few tiny pid files and some zero-size lock files.  With journald 
logging a full sometimes several day boot session to tmpfs-only, I had to 
increase the maximum tmpfs size for /run.  It's now 512 MiB (half a gig), 
still small enough not to have to worry about too much on a 16 GiB memory 
system, but large enough to handle several days worth of journals, 
including the "noise" I griped about in #4.  But that's the reason behind 
the RuntimeMaxUse and RuntimeKeepFree options above, as with that strict 
a tmpfs limit /designed/ to have journald use /most/ of it, the journald's 
default size settings, designed to shut down logging when space gets too 
tight, shut it down when there's still plenty of space left on the tmpfs.

Obviously you can adjust the tmpfs size and Runtime* settings in 
journald.conf as necessary for your system.

5d) I mentioned using systemd timers instead of cron to trigger 
logrotate.  That's out of scope for this post, but if someone's 
interested enough to request them, I can post them...


I've been happy with this setup and believe it gives me the best of both 
worlds, journald's nice log output in service status reports for the 
current session, only what I want logged to permanent storage, in my 
preferred text format and sorted in my preferred way into various files, 
both for this session and previous sessions, until I logrotate the logs 
out, just as I was doing before systemd.

And I don't have to worry about systemd's binary journal format on btrfs! 
=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-19 22:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-19 14:30 Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs Konstantinos Skarlatos
2015-02-19 17:51 ` Chris Murphy
2015-02-19 21:23   ` Duncan
2015-02-19 22:57 ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).