From: Goffredo Baroncelli <kreijack@libero.it>
To: Dave Chinner <david@fromorbit.com>, Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org, systemd-devel@lists.freedesktop.org
Subject: Re: R: Re: Slow startup of systemd-journal on BTRFS
Date: Sat, 14 Jun 2014 00:19:31 +0200 [thread overview]
Message-ID: <539B78F3.9070607@libero.it> (raw)
In-Reply-To: <20140612232453.GR9508@dastard>
Hi Dave
On 06/13/2014 01:24 AM, Dave Chinner wrote:
> On Thu, Jun 12, 2014 at 12:37:13PM +0000, Duncan wrote:
>> Goffredo Baroncelli <kreijack@libero.it> posted on Thu, 12 Jun 2014
>> 13:13:26 +0200 as excerpted:
>>
>>>> systemd has a very stupid journal write pattern. It checks if there is
>>>> space in the file for the write, and if not it fallocates the small
>>>> amount of space it needs (it does *4 byte* fallocate calls!) and then
>>>> does the write to it. All this does is fragment the crap out of the log
>>>> files because the filesystems cannot optimise the allocation patterns.
>>>
>>> I checked the code, and to me it seems that the fallocate() are done in
>>> FILE_SIZE_INCREASE unit (actually 8MB).
>>
>> FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think
>> actually pretty much equally bad without NOCOW set on the file.
>
> So maybe it's been fixed in systemd since the last time I looked.
> Yup:
>
> http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58
>
> The reason it was changed? To "save a syscall per append", not to
> prevent fragmentation of the file, which was the problem everyone
> was complaining about...
thanks for pointing that. However I am performing my tests on a fedora 20 with systemd-208, which seems have this change
>
>> Why? Because btrfs data blocks are 4 KiB. With COW, the effect for
>> either 4 byte or 8 MiB file allocations is going to end up being the
>> same, forcing (repeated until full) rewrite of each 4 KiB block into its
>> own extent.
I am reaching the conclusion that fallocate is not the problem. The fallocate increase the filesize of about 8MB, which is enough for some logging. So it is not called very often.
I have to investigate more what happens when the log are copied from /run to /var/log/journal: this is when journald seems to slow all.
I am prepared a PC which reboot continuously; I am collecting the time required to finish the boot vs the fragmentation of the system.journal file vs the number of boot. The results are dramatic: after 20 reboot, the boot time increase of 20-30 seconds. Doing a defrag of system.journal reduces the boot time to the original one, but after another 20 reboot, the boot time still requires 20-30 seconds more....
It is a slow PC, but I saw the same behavior also on a more modern pc (i5 with 8GB).
For both PC the HD is a mechanical one...
>
> And that's now a btrfs problem.... :/
Are you sure ?
ghigo@venice:/var/log$ sudo filefrag messages
messages: 29 extents found
ghigo@venice:/var/log$ sudo filefrag journal/*/system.journal
journal/41d686199835445395ac629d576dfcb9/system.journal: 1378 extents found
So the old rsyslog creates files with fewer fragments. BTRFS (but it seems also xfs) for sure highlights more this problem than other filesystem. But also systemd seems to create a lot of extens.
BR
G.Baroncelli
>
> Cheers,
>
> Dave.
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2014-06-13 22:15 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-12 11:13 R: Re: Slow startup of systemd-journal on BTRFS Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 12:37 ` Duncan
2014-06-12 23:24 ` Dave Chinner
2014-06-13 22:19 ` Goffredo Baroncelli [this message]
2014-06-14 2:53 ` Duncan
2014-06-14 7:52 ` Goffredo Baroncelli
2014-06-15 5:43 ` Duncan
2014-06-15 22:39 ` [systemd-devel] " Lennart Poettering
2014-06-15 22:13 ` Lennart Poettering
2014-06-16 0:17 ` Russell Coker
2014-06-16 1:06 ` John Williams
2014-06-16 2:19 ` Russell Coker
2014-06-16 10:14 ` Lennart Poettering
2014-06-16 10:35 ` Russell Coker
2014-06-16 11:16 ` Austin S Hemmelgarn
2014-06-16 11:56 ` Andrey Borzenkov
2014-06-16 16:05 ` Josef Bacik
2014-06-16 19:52 ` Martin
2014-06-16 20:20 ` Josef Bacik
2014-06-17 0:15 ` Austin S Hemmelgarn
2014-06-17 1:13 ` cwillu
2014-06-17 12:24 ` Martin
2014-06-17 17:56 ` Chris Murphy
2014-06-17 18:46 ` Filipe Brandenburger
2014-06-17 19:42 ` Goffredo Baroncelli
2014-06-17 21:12 ` Lennart Poettering
2014-06-16 16:32 ` Goffredo Baroncelli
2014-06-16 18:47 ` Goffredo Baroncelli
2014-06-19 1:13 ` Dave Chinner
2014-06-14 10:59 ` Kai Krakow
2014-06-15 5:02 ` Duncan
2014-06-15 11:18 ` Kai Krakow
2014-06-15 21:45 ` Martin Steigerwald
2014-06-15 21:51 ` Hugo Mills
2014-06-15 22:43 ` [systemd-devel] " Lennart Poettering
2014-06-15 21:31 ` Martin Steigerwald
2014-06-15 21:37 ` Hugo Mills
2014-06-17 8:22 ` Duncan
-- strict thread matches above, loose matches on Subject: below --
2014-06-12 11:07 R: " Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 11:05 Goffredo Baroncelli <kreijack@libero.it>
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=539B78F3.9070607@libero.it \
--to=kreijack@libero.it \
--cc=1i5t5.duncan@cox.net \
--cc=david@fromorbit.com \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=systemd-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.