From: Russell Coker <russell@coker.com.au>
To: Lennart Poettering <lennart@poettering.net>
Cc: kreijack@inwind.it, Duncan <1i5t5.duncan@cox.net>,
linux-btrfs@vger.kernel.org, systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] Slow startup of systemd-journal on BTRFS
Date: Mon, 16 Jun 2014 10:17:39 +1000 [thread overview]
Message-ID: <1709025.rRUgx5gMp1@xev> (raw)
In-Reply-To: <20140615221307.GE24386@tango.0pointer.de>
On Mon, 16 Jun 2014 00:13:07 Lennart Poettering wrote:
> On Sat, 14.06.14 09:52, Goffredo Baroncelli (kreijack@libero.it) wrote:
> > > Which effectively means that by the time the 8 MiB is filled, each 4 KiB
> > > block has been rewritten to a new location and is now an extent unto
> > > itself. So now that 8 MiB is composed of 2048 new extents, each one a
> > > single 4 KiB block in size.
> >
> > Several people pointed fallocate as the problem. But I don't
> > understand the reason.
>
> BTW, the reason we use fallocate() in journald is not about trying to
> optimize anything. It's only used for one reason: to avoid SIGBUS on
> disk/quota full, since we actually write everything to the files using
> mmap(). I mean, writing things with mmap() is always problematic, and
> handling write errors is awfully difficult, but at least two of the most
> common reasons for failure we'd like protect against in advance, under
> the assumption that disk/quota full will be reported immediately by the
> fallocate(), and the mmap writes later on will then necessarily succeed.
I just did some tests using fallocate(1). I did the tests both with and
without the -n option which appeared to make no difference.
I started by allocating a 24G file on a 106G filesystem that had 30G free
according to df. The first time that took almost 2 minutes of system CPU time
on a Q8400 CPU.
I then made a snapshot of the subvol and then used dd with the conv=notrunc
option to overwrite it. The amount of reported disk space decreased in line
with the progress of dd. So in the case of snapshots the space will be USED
(not just reserved) when you call fallocate and there is no guarantee that
space will be available when you write to it.
My systems have cron jobs to make read-only snapshots of all subvols. On
these systems you have no guarantee that mmap will succeed - apart from the
fact that the variety of problems BTRFS has in the case of running out of disk
space makes me more careful to avoid that on BTRFS than on other filesystems.
> I am not really following though why this trips up btrfs though. I am
> not sure I understand why this breaks btrfs COW behaviour. I mean,
> fallocate() isn't necessarily supposed to write anything really, it's
> mostly about allocating disk space in advance. I would claim that
> journald's usage of it is very much within the entire reason why it
> exists...
I don't believe that fallocate() makes any difference to fragmentation on
BTRFS. Blocks will be allocated when writes occur so regardless of an
fallocate() call the usage pattern in systemd-journald will cause
fragmentation.
> Anyway, happy to change these things around if necesary, but first I'd
> like to have a very good explanation why fallocate() wouldn't be the
> right thing to invoke here, and a suggestion what we should do instead
> to cover this usecase...
Systemd could request that the files in question be defragmented.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
next prev parent reply other threads:[~2014-06-16 0:17 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-12 11:13 R: Re: Slow startup of systemd-journal on BTRFS Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 12:37 ` Duncan
2014-06-12 23:24 ` Dave Chinner
2014-06-13 22:19 ` Goffredo Baroncelli
2014-06-14 2:53 ` Duncan
2014-06-14 7:52 ` Goffredo Baroncelli
2014-06-15 5:43 ` Duncan
2014-06-15 22:39 ` [systemd-devel] " Lennart Poettering
2014-06-15 22:13 ` Lennart Poettering
2014-06-16 0:17 ` Russell Coker [this message]
2014-06-16 1:06 ` John Williams
2014-06-16 2:19 ` Russell Coker
2014-06-16 10:14 ` Lennart Poettering
2014-06-16 10:35 ` Russell Coker
2014-06-16 11:16 ` Austin S Hemmelgarn
2014-06-16 11:56 ` Andrey Borzenkov
2014-06-16 16:05 ` Josef Bacik
2014-06-16 19:52 ` Martin
2014-06-16 20:20 ` Josef Bacik
2014-06-17 0:15 ` Austin S Hemmelgarn
2014-06-17 1:13 ` cwillu
2014-06-17 12:24 ` Martin
2014-06-17 17:56 ` Chris Murphy
2014-06-17 18:46 ` Filipe Brandenburger
2014-06-17 19:42 ` Goffredo Baroncelli
2014-06-17 21:12 ` Lennart Poettering
2014-06-16 16:32 ` Goffredo Baroncelli
2014-06-16 18:47 ` Goffredo Baroncelli
2014-06-19 1:13 ` Dave Chinner
2014-06-14 10:59 ` Kai Krakow
2014-06-15 5:02 ` Duncan
2014-06-15 11:18 ` Kai Krakow
2014-06-15 21:45 ` Martin Steigerwald
2014-06-15 21:51 ` Hugo Mills
2014-06-15 22:43 ` [systemd-devel] " Lennart Poettering
2014-06-15 21:31 ` Martin Steigerwald
2014-06-15 21:37 ` Hugo Mills
2014-06-17 8:22 ` Duncan
-- strict thread matches above, loose matches on Subject: below --
2014-06-11 21:28 Goffredo Baroncelli
2014-06-12 1:21 ` Dave Chinner
2014-06-12 1:37 ` Dave Chinner
2014-06-12 2:32 ` Chris Murphy
2014-06-15 22:34 ` [systemd-devel] " Lennart Poettering
2014-06-16 4:01 ` Chris Murphy
2014-06-16 4:38 ` cwillu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1709025.rRUgx5gMp1@xev \
--to=russell@coker.com.au \
--cc=1i5t5.duncan@cox.net \
--cc=kreijack@inwind.it \
--cc=lennart@poettering.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=systemd-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).