linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: libc-hacker@sourceware.org, linux-fsdevel@vger.kernel.org
Cc: Trond Myklebust <trondmy@hammerspace.com>
Subject: posix_fallocate behavior in glibc
Date: Wed, 26 Jun 2024 08:01:34 +0200	[thread overview]
Message-ID: <20240626060134.GA22955@lst.de> (raw)

Hi all,

Trond brought the glibc posix_fallocate behavior to my attention.

As a refresher, this is how Open Group defines posix_fallocate:

   The posix_fallocate() function shall ensure that any required storage
   for regular file data starting at offset and continuing for len bytes
   is allocated on the file system storage media. If posix_fallocate()
   returns successfully, subsequent writes to the specified file data
   shall not fail due to the lack of free space on the file system
   storage media.

The glibc implementation in sysdeps/posix/posix_fallocate.c, which is
also by sysdeps/unix/sysv/linux/posix_fallocate.c as a fallback if the
fallocate syscall returns EOPNOTSUPP is implemented by doing single
byte writes at intervals of min(f.f_bsize, 4096).

This assumes the writes to a file guarantee allocating space for future
writes.  Such an assumption is false for write out place file systems
which have been around since at least they early 1990s, but are becoming
at lot more common in the last decode.  Native Linux examples are
all file systems sitting on zoned devices where this is required
behavior, but also the nilfs2 file system or the LFS mode in f2fs.
On top of that it is fairly common for storage systems exposing
network file system access.

How can we get rid of this glibc fallback that turns the implementations
non-conformant and increases write amplication for no good reason?

             reply	other threads:[~2024-06-26  6:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26  6:01 Christoph Hellwig [this message]
2024-07-29 15:09 ` posix_fallocate behavior in glibc Christoph Hellwig
2024-07-29 15:11   ` Sam James
  -- strict thread matches above, loose matches on Subject: below --
2024-07-29 16:09 Christoph Hellwig
2024-07-29 17:23 ` Paul Eggert
2024-07-29 17:43   ` Christoph Hellwig
2024-07-29 17:54     ` Adhemerval Zanella Netto
     [not found]     ` <CAPBLoAf11hM0PLhqPG5gUyivU9U1manpOOhDWCPugUmWc1VVUw@mail.gmail.com>
2024-07-29 18:45       ` Christoph Hellwig
2024-07-29 17:57 ` Florian Weimer
2024-07-29 18:44   ` Christoph Hellwig
2024-07-29 18:52     ` Florian Weimer
2024-07-29 19:01       ` Christoph Hellwig
2024-07-29 19:23         ` Florian Weimer
2024-07-30 15:47           ` Christoph Hellwig
2024-07-30 16:11             ` Paul Eggert
2024-07-30 16:20               ` Christoph Hellwig
2024-07-30 17:03                 ` Florian Weimer
2024-07-30 17:08                   ` Christoph Hellwig
2024-07-30 17:29                     ` Florian Weimer
2024-07-30 17:52                   ` Mark Wielaard
2024-07-31  2:32                   ` Theodore Ts'o
2024-07-29 23:53       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240626060134.GA22955@lst.de \
    --to=hch@lst.de \
    --cc=libc-hacker@sourceware.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).