From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Junio C Hamano <gitster@pobox.com>,
Christoph Hellwig <hch@lst.de>,
Git Mailing List <git@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Jacob Vosmaer <jacob@gitlab.com>
Subject: Re: [PATCH] enable core.fsyncObjectFiles by default
Date: Thu, 17 Sep 2020 13:06:50 +0200 [thread overview]
Message-ID: <87sgbghdbp.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20180117235220.GD6948@thunk.org>
On Thu, Jan 18 2018, Theodore Ts'o wrote:
> On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
>>
>> Now re-do the test while another process writes to a totally unrelated
>> a huge file (say, do a ISO file copy or something).
>>
>> That was the thing that several filesystems get completely and
>> horribly wrong. Generally _particularly_ the logging filesystems that
>> don't even need the fsync, because they use a single log for
>> everything (so fsync serializes all the writes, not just the writes to
>> the one file it's fsync'ing).
>
> Well, let's be fair; this is something *ext3* got wrong, and it was
> the default file system back them. All of the modern file systems now
> do delayed allocation, which means that an fsync of one file doesn't
> actually imply an fsync of another file. Hence...
>
>> The original git design was very much to write each object file
>> without any syncing, because they don't matter since a new object file
>> - by definition - isn't really reachable. Then sync before writing the
>> index file or a new ref.
>
> This isn't really safe any more. Yes, there's a single log. But
> files which are subject to delayed allocation are in the page cache,
> and just because you fsync the index file doesn't mean that the object
> file is now written to disk. It was true for ext3, but it's not true
> for ext4, xfs, btrfs, etc.
>
> The good news is that if you have another process downloading a huge
> ISO image, the fsync of the index file won't force the ISO file to be
> written out. The bad news is that it won't force out the other git
> object files, either.
>
> Now, there is a potential downside of fsync'ing each object file, and
> that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and
> even on a SSD, it's not optimal to call CACHE FLUSH thousands of times
> in a second. So if you are creating thousands of tiny files, and you
> fsync each one, each fsync(2) call is a serializing instruction, which
> means it won't return until that one file is written to disk. If you
> are writing lots of small files, and you are using a HDD, you'll be
> bottlenecked to around 30 files per second on a 5400 RPM HDD, and this
> is true regardless of what file system you use, because the bottle
> neck is the CACHE FLUSH operation, and how you organize the metadata
> and how you do the block allocation, is largely lost in the noise
> compared to the CACHE FLUSH command, which serializes everything.
>
> There are solutions to this; you could simply not call fsync(2) a
> thousand times, and instead write a pack file, and call fsync once on
> the pack file. That's probably the smartest approach.
>
> You could also create a thousand threads, and call fsync(2) on those
> thousand threads at roughly the same time. Or you could use a
> bleeding edge kernel with the latest AIO patch, and use the newly
> added IOCB_CMD_FSYNC support.
>
> But I'd simply recommend writing a pack and fsync'ing the pack,
> instead of trying to write a gazillion object files. (git-repack -A,
> I'm looking at you....)
>
> - Ted
[I didn't find an ideal message to reply to in this thread, but this
seemed to probably be the best]
Just an update on this since I went back and looked at this thread,
GitLab about ~1yr ago turned on core.fsyncObjectFiles=true by
default.
The reason is detailed in [1], tl;dr: empty loose object file issue on
ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I
didn't do any root cause analysis. Just noting it here for for future
reference.
1. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/51680#note_180508774
next prev parent reply other threads:[~2020-09-17 11:07 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-17 18:48 [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2018-01-17 19:04 ` Junio C Hamano
2018-01-17 19:35 ` Christoph Hellwig
2018-01-17 19:35 ` Christoph Hellwig
2018-01-17 20:05 ` Andreas Schwab
2018-01-17 19:37 ` Matthew Wilcox
2018-01-17 19:42 ` Christoph Hellwig
2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason
2018-01-17 22:07 ` Linus Torvalds
2018-01-17 22:25 ` Linus Torvalds
2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason
2018-01-17 23:42 ` Linus Torvalds
2018-01-17 23:52 ` Theodore Ts'o
2018-01-17 23:57 ` Linus Torvalds
2018-01-18 16:27 ` Christoph Hellwig
2018-01-19 19:08 ` Junio C Hamano
2018-01-20 22:14 ` Theodore Ts'o
2018-01-20 22:27 ` Junio C Hamano
2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason
2018-01-22 18:09 ` Theodore Ts'o
2018-01-22 18:09 ` Theodore Ts'o
2018-01-23 0:47 ` Jeff King
2018-01-23 5:45 ` Theodore Ts'o
2018-01-23 5:45 ` Theodore Ts'o
2018-01-23 16:17 ` Jeff King
2018-01-23 0:25 ` Jeff King
2018-01-21 21:32 ` Chris Mason
2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason [this message]
2020-09-17 11:28 ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
2020-09-17 11:28 ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2020-09-17 13:16 ` Jeff King
2020-09-17 15:09 ` Christoph Hellwig
2020-09-17 14:09 ` Christoph Hellwig
2020-09-17 14:55 ` Jeff King
2020-09-17 14:56 ` Christoph Hellwig
2020-09-17 15:37 ` Junio C Hamano
2020-09-17 17:12 ` Jeff King
2020-09-17 20:37 ` Taylor Blau
2020-09-22 10:42 ` Ævar Arnfjörð Bjarmason
2020-09-17 20:21 ` Johannes Sixt
2020-09-22 8:24 ` Ævar Arnfjörð Bjarmason
2020-11-19 11:38 ` Johannes Schindelin
2020-09-17 11:28 ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
2020-09-17 14:12 ` Christoph Hellwig
2020-09-17 15:43 ` Junio C Hamano
2020-09-17 20:15 ` Johannes Sixt
2020-10-08 8:13 ` Johannes Schindelin
2020-10-08 15:57 ` Ævar Arnfjörð Bjarmason
2020-10-08 18:53 ` Junio C Hamano
2020-10-09 10:44 ` Johannes Schindelin
2020-09-17 19:21 ` Marc Branchaud
2020-09-17 14:14 ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2020-09-17 15:30 ` Junio C Hamano
2018-01-17 20:55 ` Jeff King
2018-01-17 21:10 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2015-06-23 21:57 [PATCH] Enable " Stefan Beller
2015-06-23 22:21 ` Junio C Hamano
2015-06-23 23:29 ` Theodore Ts'o
2015-06-24 5:32 ` Junio C Hamano
2015-06-24 14:30 ` Theodore Ts'o
2015-06-24 1:07 ` Duy Nguyen
2015-06-24 3:37 ` Jeff King
2015-06-24 5:20 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sgbghdbp.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hch@lst.de \
--cc=jacob@gitlab.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.