From: Lennart Poettering <lennart@poettering.net>
To: Duncan <1i5t5.duncan@cox.net>
Cc: systemd-devel@lists.freedesktop.org, linux-btrfs@vger.kernel.org
Subject: Re: [systemd-devel] Slow startup of systemd-journal on BTRFS
Date: Mon, 16 Jun 2014 00:39:39 +0200 [thread overview]
Message-ID: <20140615223939.GG24386@tango.0pointer.de> (raw)
In-Reply-To: <pan$c82aa$ea13b40$dc4fc67d$63d84512@cox.net>
On Sun, 15.06.14 05:43, Duncan (1i5t5.duncan@cox.net) wrote:
> The base problem isn't fallocate per se, rather, tho it's the trigger in
> this case. The base problem is that for COW-based filesystems, *ANY*
> rewriting of existing file content results in fragmentation.
>
> It just so happens that the only reason there's existing file content to
> be rewritten (as opposed to simply appending) in this case, is because of
> the fallocate. The rewrite of existing file content is the problem, but
> the existing file content is only there in this case because of the
> fallocate.
>
> Taking a step back...
>
> On a non-COW filesystem, allocating 8 MiB ahead and writing into it
> rewrites into the already allocated location, thus guaranteeing extents
> of 8 MiB each, since once the space is allocated it's simply rewritten in-
> place. Thus, on a non-COW filesystem, pre-allocating in something larger
> than single filesystem blocks when an app knows the data is eventually
> going to be written in to fill that space anyway is a GOOD thing, which
> is why systemd is doing it.
Nope, that's not why we do it. We do it to avoid SIGBUS on disk full...
> But on a COW-based filesystem fallocate is the exact opposite, a BAD
> thing, because an fallocate forces the file to be written out at that
> size, effectively filled with nulls/blanks. Then the actual logging
> comes along and rewrites those nulls/blanks with actual data, but it's
> now a rewrite, which on a COW, copy-on-write, based filesystem, the
> rewritten block is copied elsewhere, it does NOT overwrite the existing
> null/blank block, and "elsewhere" by definition means detached from the
> previous blocks, thus in an extent all by itself.
Well, quite frankly I am not entirely sure why fallocate() would be any
useful like that on COW file systems, if this is really how it is
implemented... I mean, as I understood fallocate() -- and as the man
page suggests -- it is something for reserving space on disk, not for
writing out anything. This is why journald is invoking it, to reserve
the space, so that later write accesses to it will not require any
reservation anymore, and hence are unlikely to fail.
Lennart
--
Lennart Poettering, Red Hat
next prev parent reply other threads:[~2014-06-15 22:39 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-12 11:13 R: Re: Slow startup of systemd-journal on BTRFS Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 12:37 ` Duncan
2014-06-12 23:24 ` Dave Chinner
2014-06-13 22:19 ` Goffredo Baroncelli
2014-06-14 2:53 ` Duncan
2014-06-14 7:52 ` Goffredo Baroncelli
2014-06-15 5:43 ` Duncan
2014-06-15 22:39 ` Lennart Poettering [this message]
2014-06-15 22:13 ` [systemd-devel] " Lennart Poettering
2014-06-16 0:17 ` Russell Coker
2014-06-16 1:06 ` John Williams
2014-06-16 2:19 ` Russell Coker
2014-06-16 10:14 ` Lennart Poettering
2014-06-16 10:35 ` Russell Coker
2014-06-16 11:16 ` Austin S Hemmelgarn
2014-06-16 11:56 ` Andrey Borzenkov
2014-06-16 16:05 ` Josef Bacik
2014-06-16 19:52 ` Martin
2014-06-16 20:20 ` Josef Bacik
2014-06-17 0:15 ` Austin S Hemmelgarn
2014-06-17 1:13 ` cwillu
2014-06-17 12:24 ` Martin
2014-06-17 17:56 ` Chris Murphy
2014-06-17 18:46 ` Filipe Brandenburger
2014-06-17 19:42 ` Goffredo Baroncelli
2014-06-17 21:12 ` Lennart Poettering
2014-06-16 16:32 ` Goffredo Baroncelli
2014-06-16 18:47 ` Goffredo Baroncelli
2014-06-19 1:13 ` Dave Chinner
2014-06-14 10:59 ` Kai Krakow
2014-06-15 5:02 ` Duncan
2014-06-15 11:18 ` Kai Krakow
2014-06-15 21:45 ` Martin Steigerwald
2014-06-15 21:51 ` Hugo Mills
2014-06-15 22:43 ` [systemd-devel] " Lennart Poettering
2014-06-15 21:31 ` Martin Steigerwald
2014-06-15 21:37 ` Hugo Mills
2014-06-17 8:22 ` Duncan
-- strict thread matches above, loose matches on Subject: below --
2014-06-11 21:28 Goffredo Baroncelli
2014-06-12 1:21 ` Dave Chinner
2014-06-12 1:37 ` Dave Chinner
2014-06-12 2:32 ` Chris Murphy
2014-06-15 22:34 ` [systemd-devel] " Lennart Poettering
2014-06-16 4:01 ` Chris Murphy
2014-06-16 4:38 ` cwillu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140615223939.GG24386@tango.0pointer.de \
--to=lennart@poettering.net \
--cc=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=systemd-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).