linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* posix_fallocate behavior in glibc
@ 2024-06-26  6:01 Christoph Hellwig
  2024-07-29 15:09 ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2024-06-26  6:01 UTC (permalink / raw)
  To: libc-hacker, linux-fsdevel; +Cc: Trond Myklebust

Hi all,

Trond brought the glibc posix_fallocate behavior to my attention.

As a refresher, this is how Open Group defines posix_fallocate:

   The posix_fallocate() function shall ensure that any required storage
   for regular file data starting at offset and continuing for len bytes
   is allocated on the file system storage media. If posix_fallocate()
   returns successfully, subsequent writes to the specified file data
   shall not fail due to the lack of free space on the file system
   storage media.

The glibc implementation in sysdeps/posix/posix_fallocate.c, which is
also by sysdeps/unix/sysv/linux/posix_fallocate.c as a fallback if the
fallocate syscall returns EOPNOTSUPP is implemented by doing single
byte writes at intervals of min(f.f_bsize, 4096).

This assumes the writes to a file guarantee allocating space for future
writes.  Such an assumption is false for write out place file systems
which have been around since at least they early 1990s, but are becoming
at lot more common in the last decode.  Native Linux examples are
all file systems sitting on zoned devices where this is required
behavior, but also the nilfs2 file system or the LFS mode in f2fs.
On top of that it is fairly common for storage systems exposing
network file system access.

How can we get rid of this glibc fallback that turns the implementations
non-conformant and increases write amplication for no good reason?

^ permalink raw reply	[flat|nested] 22+ messages in thread
* posix_fallocate behavior in glibc
@ 2024-07-29 16:09 Christoph Hellwig
  2024-07-29 17:23 ` Paul Eggert
  2024-07-29 17:57 ` Florian Weimer
  0 siblings, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2024-07-29 16:09 UTC (permalink / raw)
  To: libc-alpha, linux-fsdevel; +Cc: Trond Myklebust

Hi glibc hackers,

Trond brought the glibc posix_fallocate behavior to my attention.

As a refresher, this is how Open Group defines posix_fallocate:

   The posix_fallocate() function shall ensure that any required storage
   for regular file data starting at offset and continuing for len bytes
   is allocated on the file system storage media. If posix_fallocate()
   returns successfully, subsequent writes to the specified file data
   shall not fail due to the lack of free space on the file system
   storage media.

The glibc implementation in sysdeps/posix/posix_fallocate.c, which is
also by sysdeps/unix/sysv/linux/posix_fallocate.c as a fallback if the
fallocate syscall returns EOPNOTSUPP is implemented by doing single
byte writes at intervals of min(f.f_bsize, 4096).

This assumes the writes to a file guarantee allocating space for future
writes.  Such an assumption is false for write out place file systems
which have been around since at least they early 1990s, but are becoming
at lot more common in the last decode.  Native Linux examples are
all file systems sitting on zoned devices where this is required
behavior, but also the nilfs2 file system or the LFS mode in f2fs.
On top of that it is fairly common for storage systems exposing
network file system access.

How can we get rid of this glibc fallback that turns the implementations
non-conformant and increases write amplication for no good reason?

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-07-31  2:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-26  6:01 posix_fallocate behavior in glibc Christoph Hellwig
2024-07-29 15:09 ` Christoph Hellwig
2024-07-29 15:11   ` Sam James
  -- strict thread matches above, loose matches on Subject: below --
2024-07-29 16:09 Christoph Hellwig
2024-07-29 17:23 ` Paul Eggert
2024-07-29 17:43   ` Christoph Hellwig
2024-07-29 17:54     ` Adhemerval Zanella Netto
     [not found]     ` <CAPBLoAf11hM0PLhqPG5gUyivU9U1manpOOhDWCPugUmWc1VVUw@mail.gmail.com>
2024-07-29 18:45       ` Christoph Hellwig
2024-07-29 17:57 ` Florian Weimer
2024-07-29 18:44   ` Christoph Hellwig
2024-07-29 18:52     ` Florian Weimer
2024-07-29 19:01       ` Christoph Hellwig
2024-07-29 19:23         ` Florian Weimer
2024-07-30 15:47           ` Christoph Hellwig
2024-07-30 16:11             ` Paul Eggert
2024-07-30 16:20               ` Christoph Hellwig
2024-07-30 17:03                 ` Florian Weimer
2024-07-30 17:08                   ` Christoph Hellwig
2024-07-30 17:29                     ` Florian Weimer
2024-07-30 17:52                   ` Mark Wielaard
2024-07-31  2:32                   ` Theodore Ts'o
2024-07-29 23:53       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).