All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lennart Poettering <lennart@poettering.net>
To: Chris Murphy <lists@colorremedies.com>
Cc: Dave Chinner <david@fromorbit.com>,
	kreijack@inwind.it,
	systemd Mailing List <systemd-devel@lists.freedesktop.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [systemd-devel] Slow startup of systemd-journal on BTRFS
Date: Mon, 16 Jun 2014 00:34:21 +0200	[thread overview]
Message-ID: <20140615223421.GF24386@tango.0pointer.de> (raw)
In-Reply-To: <5E3380D5-FF9F-4152-B115-7D16CD8CC215@colorremedies.com>

On Wed, 11.06.14 20:32, Chris Murphy (lists@colorremedies.com) wrote:

> > systemd has a very stupid journal write pattern. It checks if there
> > is space in the file for the write, and if not it fallocates the
> > small amount of space it needs (it does *4 byte* fallocate calls!)

Not really the case. 

http://cgit.freedesktop.org/systemd/systemd/tree/src/journal/journal-file.c#n354

We allocate 8mb at minimum.

> > and then does the write to it.  All this does is fragment the crap
> > out of the log files because the filesystems cannot optimise the
> > allocation patterns.

Well, it would be good if you'd tell me what to do instead...

I am invoking fallocate() in advance, because we write those files with
mmap() and that of course would normally triggered SIGBUS already on the
most boring of reasons, such as disk full/quota full or so. Hence,
before we do anything like that, we invoke fallocate() to ensure that
the space is actually available... As far as I can see, that pretty much
in line with what fallocate() is supposed to be useful for, the man page
says this explicitly:

     "...After a successful call to posix_fallocate(), subsequent writes
      to bytes in the specified range are guaranteed not to fail because
      of lack of disk space."

Happy to be informed that the man page is wrong. 

I am also happy to change our code, if it really is the wrong thing to
do. Note however that I generally favour correctness and relying on
documented behaviour, instead of nebulous optimizations whose effects
might change with different file systems or kernel versions...

> > Yup, it fragments journal files on XFS, too.
> > 
> > http://oss.sgi.com/archives/xfs/2014-03/msg00322.html
> > 
> > IIRC, the systemd developers consider this a filesystem problem and
> > so refused to change the systemd code to be nice to the filesystem
> > allocators, even though they don't actually need to use fallocate...

What? No need to be dick. Nobody ever pinged me about this. And yeah, I
think I have a very good reason to use fallocate(). The only reason in
fact the man page explicitly mentions.

Lennart

-- 
Lennart Poettering, Red Hat

  reply	other threads:[~2014-06-15 22:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11 21:28 Slow startup of systemd-journal on BTRFS Goffredo Baroncelli
2014-06-12  0:40 ` Chris Murphy
2014-06-12  1:18 ` Russell Coker
2014-06-12  4:39   ` Duncan
2014-06-12  1:21 ` Dave Chinner
2014-06-12  1:37   ` Dave Chinner
2014-06-12  2:32     ` Chris Murphy
2014-06-15 22:34       ` Lennart Poettering [this message]
2014-06-16  4:01         ` [systemd-devel] " Chris Murphy
2014-06-16  4:38           ` cwillu
  -- strict thread matches above, loose matches on Subject: below --
2014-06-12 11:13 R: " Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 12:37 ` Duncan
2014-06-12 23:24   ` Dave Chinner
2014-06-13 22:19     ` Goffredo Baroncelli
2014-06-14  2:53       ` Duncan
2014-06-14  7:52         ` Goffredo Baroncelli
2014-06-15  5:43           ` Duncan
2014-06-15 22:39             ` [systemd-devel] " Lennart Poettering
2014-06-15 22:13           ` Lennart Poettering
2014-06-16  0:17             ` Russell Coker
2014-06-16  1:06               ` John Williams
2014-06-16  2:19                 ` Russell Coker
2014-06-16 10:14               ` Lennart Poettering
2014-06-16 10:35                 ` Russell Coker
2014-06-16 11:16                   ` Austin S Hemmelgarn
2014-06-16 11:56                 ` Andrey Borzenkov
2014-06-16 16:05                 ` Josef Bacik
2014-06-16 19:52                   ` Martin
2014-06-16 20:20                     ` Josef Bacik
2014-06-17  0:15                     ` Austin S Hemmelgarn
2014-06-17  1:13                     ` cwillu
2014-06-17 12:24                       ` Martin
2014-06-17 17:56                       ` Chris Murphy
2014-06-17 18:46                       ` Filipe Brandenburger
2014-06-17 19:42                         ` Goffredo Baroncelli
2014-06-17 21:12                   ` Lennart Poettering
2014-06-16 16:32             ` Goffredo Baroncelli
2014-06-16 18:47               ` Goffredo Baroncelli
2014-06-19  1:13             ` Dave Chinner
2014-06-14 10:59         ` Kai Krakow
2014-06-15 22:43           ` [systemd-devel] " Lennart Poettering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140615223421.GF24386@tango.0pointer.de \
    --to=lennart@poettering.net \
    --cc=david@fromorbit.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=systemd-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.