linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lennart Poettering <lennart@poettering.net>
To: Chris Murphy <lists@colorremedies.com>
Cc: Dave Chinner <david@fromorbit.com>,
	kreijack@inwind.it,
	systemd Mailing List <systemd-devel@lists.freedesktop.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [systemd-devel] Slow startup of systemd-journal on BTRFS
Date: Mon, 16 Jun 2014 00:34:21 +0200	[thread overview]
Message-ID: <20140615223421.GF24386@tango.0pointer.de> (raw)
In-Reply-To: <5E3380D5-FF9F-4152-B115-7D16CD8CC215@colorremedies.com>

On Wed, 11.06.14 20:32, Chris Murphy (lists@colorremedies.com) wrote:

> > systemd has a very stupid journal write pattern. It checks if there
> > is space in the file for the write, and if not it fallocates the
> > small amount of space it needs (it does *4 byte* fallocate calls!)

Not really the case. 

http://cgit.freedesktop.org/systemd/systemd/tree/src/journal/journal-file.c#n354

We allocate 8mb at minimum.

> > and then does the write to it.  All this does is fragment the crap
> > out of the log files because the filesystems cannot optimise the
> > allocation patterns.

Well, it would be good if you'd tell me what to do instead...

I am invoking fallocate() in advance, because we write those files with
mmap() and that of course would normally triggered SIGBUS already on the
most boring of reasons, such as disk full/quota full or so. Hence,
before we do anything like that, we invoke fallocate() to ensure that
the space is actually available... As far as I can see, that pretty much
in line with what fallocate() is supposed to be useful for, the man page
says this explicitly:

     "...After a successful call to posix_fallocate(), subsequent writes
      to bytes in the specified range are guaranteed not to fail because
      of lack of disk space."

Happy to be informed that the man page is wrong. 

I am also happy to change our code, if it really is the wrong thing to
do. Note however that I generally favour correctness and relying on
documented behaviour, instead of nebulous optimizations whose effects
might change with different file systems or kernel versions...

> > Yup, it fragments journal files on XFS, too.
> > 
> > http://oss.sgi.com/archives/xfs/2014-03/msg00322.html
> > 
> > IIRC, the systemd developers consider this a filesystem problem and
> > so refused to change the systemd code to be nice to the filesystem
> > allocators, even though they don't actually need to use fallocate...

What? No need to be dick. Nobody ever pinged me about this. And yeah, I
think I have a very good reason to use fallocate(). The only reason in
fact the man page explicitly mentions.

Lennart

-- 
Lennart Poettering, Red Hat

  reply	other threads:[~2014-06-15 22:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11 21:28 Slow startup of systemd-journal on BTRFS Goffredo Baroncelli
2014-06-12  0:40 ` Chris Murphy
2014-06-12  1:18 ` Russell Coker
2014-06-12  4:39   ` Duncan
2014-06-12  1:21 ` Dave Chinner
2014-06-12  1:37   ` Dave Chinner
2014-06-12  2:32     ` Chris Murphy
2014-06-15 22:34       ` Lennart Poettering [this message]
2014-06-16  4:01         ` [systemd-devel] " Chris Murphy
2014-06-16  4:38           ` cwillu
  -- strict thread matches above, loose matches on Subject: below --
2014-06-12 11:13 R: " Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 12:37 ` Duncan
2014-06-12 23:24   ` Dave Chinner
2014-06-13 22:19     ` Goffredo Baroncelli
2014-06-14  2:53       ` Duncan
2014-06-14  7:52         ` Goffredo Baroncelli
2014-06-15  5:43           ` Duncan
2014-06-15 22:39             ` [systemd-devel] " Lennart Poettering
2014-06-15 22:13           ` Lennart Poettering
2014-06-16  0:17             ` Russell Coker
2014-06-16  1:06               ` John Williams
2014-06-16  2:19                 ` Russell Coker
2014-06-16 10:14               ` Lennart Poettering
2014-06-16 10:35                 ` Russell Coker
2014-06-16 11:16                   ` Austin S Hemmelgarn
2014-06-16 11:56                 ` Andrey Borzenkov
2014-06-16 16:05                 ` Josef Bacik
2014-06-16 19:52                   ` Martin
2014-06-16 20:20                     ` Josef Bacik
2014-06-17  0:15                     ` Austin S Hemmelgarn
2014-06-17  1:13                     ` cwillu
2014-06-17 12:24                       ` Martin
2014-06-17 17:56                       ` Chris Murphy
2014-06-17 18:46                       ` Filipe Brandenburger
2014-06-17 19:42                         ` Goffredo Baroncelli
2014-06-17 21:12                   ` Lennart Poettering
2014-06-16 16:32             ` Goffredo Baroncelli
2014-06-16 18:47               ` Goffredo Baroncelli
2014-06-19  1:13             ` Dave Chinner
2014-06-14 10:59         ` Kai Krakow
2014-06-15 22:43           ` [systemd-devel] " Lennart Poettering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140615223421.GF24386@tango.0pointer.de \
    --to=lennart@poettering.net \
    --cc=david@fromorbit.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=systemd-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).