Re: Wislist for Linux from the mold linker's POV

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Theodore Ts'o" <tytso@mit.edu>
To: "Niklas Hambüchen" <mail@nh2.me>
Cc: Rui Ueyama <rui314@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Florian Weimer <fw@deneb.enyo.de>
Subject: Re: Wislist for Linux from the mold linker's POV
Date: Fri, 29 Nov 2024 08:12:44 -1000	[thread overview]
Message-ID: <20241129181244.GA11702@mit.edu> (raw)
In-Reply-To: <2c33be3f-8c41-48f1-a6ad-b4ea00ec515f@nh2.me>

On Fri, Nov 29, 2024 at 06:38:47AM +0100, Niklas Hambüchen wrote:
> Turns out, `ext4` has built in a feature to work around bad applications forgetting `fsync()`:
> 
> `close()`ing new files is fast.
> But if you `close()` existing files after writing them from scratch, or atomic-rename something replacing them, ext4 will insert an `fsync()`!

It's not actually an fsync() in the close case).  We initiate
writeback, but we don't actually wait for the writes to complete on
the close().  In the case of rename(), we do wait for the writes to
complete before the file system transaction which commits the
rename(2) is allowed to complete.  But in the case where the
application programmer is too lazy to call fsync(2), the delayed
completion of the transaction complete is the implicit commit, and
nothing is bloced behind it.  (See below for more details.)

But yes, the reason behind this is applications such as tuxracer
writing the top-ten score file, and then shutting down OpenGL, and the
out-of-tree nvidia driver would sometimes^H^H^H^H^H^H^H^H^H always
crash leave a corrupted or missing top-ten score file, and this
resulted in a bunch of users whinging.

Also at one poiont, both the KDE and Gnome text editors also did the
open with O_TRUNC and rewrite, because it was the simplest way to
avoid losing the extended attrbutes (otherwise the application
programmers would have to actually copy the extended attriburtes, and
That Was Too Hard).  I don't know why programmers would edit precious
source files using something *other* than emacs, or vi, but....

In essence, file system developers are massively outnumbered by
application programs, and for some reason as a class application
programmers don't seem to be very careful about data corruption
compared to file system developers --- and users *always* blame the
file system developers.

As Niklas points out in his reference, this can be disabled by a mount
option, noauto_da_alloc:

   auto_da_alloc(*), noauto_da_alloc

       Many broken applications don’t use fsync() when replacing
       existing files via patterns such as fd =
       open(“foo.new”)/write(fd,..)/close(fd)/ rename(“foo.new”,
       “foo”), or worse yet, fd = open(“foo”,
       O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled,
       ext4 will detect the replace-via-rename and
       replace-via-truncate patterns and force that any delayed
       allocation blocks are allocated such that at the next journal
       commit, in the default data=ordered mode, the data blocks of
       the new file are forced to disk before the rename() operation
       is committed. This provides roughly the same level of
       guarantees as ext3, and avoids the “zero-length” problem that
       can happen when a system crashes before the delayed allocation
       blocks are forced to disk.

So if you care about performance above all else, and you trust all of
the application programmers responsible for programs on your system
being sufficiently careful, feel free to use the noauto_da_alloc
option.  :-)

					- Ted

next prev parent reply	other threads:[~2024-11-30  1:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-28  2:52 Wislist for Linux from the mold linker's POV Rui Ueyama
2024-11-28 17:41 ` Florian Weimer
2024-11-29  0:44   ` Rui Ueyama
2024-11-29  5:38     ` Niklas Hambüchen
2024-11-29 18:12       ` Theodore Ts'o [this message]
2024-11-30 15:36         ` Niklas Hambüchen
2024-11-29  7:17 ` наб
2024-11-29  7:25   ` Rui Ueyama
2024-11-29  7:37     ` наб
2024-12-04 10:42 ` Bernd Petrovitsch
2024-12-04 10:43 ` Bernd Petrovitsch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241129181244.GA11702@mit.edu \
    --to=tytso@mit.edu \
    --cc=fw@deneb.enyo.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mail@nh2.me \
    --cc=rui314@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox