All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	 Matthew Wilcox <willy@infradead.org>,
	Hans Holmberg <hans.holmberg@wdc.com>,
	 linux-xfs@vger.kernel.org, Carlos Maiolino <cem@kernel.org>,
	 "Darrick J . Wong" <djwong@kernel.org>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 libc-alpha@sourceware.org
Subject: Re: [RFC] xfs: fake fallocate success for always CoW inodes
Date: Mon, 10 Nov 2025 06:27:41 +0100	[thread overview]
Message-ID: <lhuseem1mpe.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <aRESlvWf9VquNzx3@dread.disaster.area> (Dave Chinner's message of "Mon, 10 Nov 2025 09:15:50 +1100")

* Dave Chinner:

> On Sat, Nov 08, 2025 at 01:30:18PM +0100, Florian Weimer wrote:
>> * Christoph Hellwig:
>> 
>> > On Thu, Nov 06, 2025 at 05:31:28PM +0100, Florian Weimer wrote:
>> >> It's been a few years, I think, and maybe we should drop the allocation
>> >> logic from posix_fallocate in glibc?  Assuming that it's implemented
>> >> everywhere it makes sense?
>> >
>> > I really think it should go away.  If it turns out we find cases where
>> > it was useful we can try to implement a zeroing fallocate in the kernel
>> > for the file system where people want it.
>
> This is what the shiny new FALLOC_FL_WRITE_ZEROS command is supposed
> to provide. We don't have widepsread support in filesystems for it
> yet, though.
>
>> > gfs2 for example currently
>> > has such an implementation, and we could have somewhat generic library
>> > version of it.
>
> Yup, seems like a iomap iter loop would be pretty trivial to
> abstract from that...
>
>> Sorry, I remember now where this got stuck the last time.
>> 
>> This program:
>> 
>> #include <fcntl.h>
>> #include <stddef.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>> 
>> int
>> main(void)
>> {
>>   FILE *fp = tmpfile();
>>   if (fp == NULL)
>>     abort();
>>   int fd = fileno(fp);
>>   posix_fallocate(fd, 0, 1);
>>   char *p = mmap(NULL, 1, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>   *p = 1;
>> }
>> 
>> should not crash even if the file system does not support fallocate.
>
> I think that's buggy application code.
>
> Failing to check the return value of a library call that documents
> EOPNOTSUPP as a valid error is a bug. IOWs, the above code *should*
> SIGBUS on the mmap access, because it failed to verify that the file
> extension operation actually worked.

Sorry, I made the example confusing.

How would the application deal with failure due to lack of fallocate
support?  It would have to do a pwrite, like posix_fallocate does to
today, or maybe ftruncate.  This is way I think removing the fallback
from posix_fallocate completely is mostly pointless.

>> I hope we can agree on that.  I expect avoiding SIGBUS errors because
>> of insufficient file size is a common use case for posix_fallocate.
>> This use is not really an optimization, it's required to get mmap
>> working properly.
>> 
>> If we can get an fallocate mode that we can use as a fallback to
>> increase the file size with a zero flag argument, we can definitely
>
> The fallocate() API already support that, in two different ways:
> FALLOC_FL_ZERO_RANGE and FALLOC_FL_WRITE_ZEROS.

Neither is appropriate for posix_fallocate because they are as
destructive as the existing fallback.

> But, again, not all filesystems support these, so userspace has to
> be prepared to receive -EOPNOTSUPP from these calls. Hence userspace
> has to do the right thing for posix_fallocate() if you want to
> ensure that it always extend the file size even when fallocate()
> calls fail...

Sure, but eventually, we may get into a better situation.

>> use that in posix_fallocate (replacing the fallback path on kernels
>> that support it).  All local file systems should be able to implement
>> that (but perhaps not efficiently).  Basically, what we need here is a
>> non-destructive ftruncate.
>
> You aren't going to get support for such new commands on existing
> kernels, so userspace is still going to have to code the ftruncate()
> fallback itself for the desired behaviour to be provided
> consistently to applications.
>
> As such, I don't see any reason for the fallocate() syscall
> providing some whacky "ftruncate() in all but name" mode.

Please reconsider.  If we start fixing this, we'll eventually be in a
position where the glibc fallback code never runs.

Thanks,
Florian


  reply	other threads:[~2025-11-10  5:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-06 13:35 [RFC] xfs: fake fallocate success for always CoW inodes Hans Holmberg
2025-11-06 13:48 ` Florian Weimer
2025-11-06 13:52   ` Christoph Hellwig
2025-11-06 14:42     ` Matthew Wilcox
2025-11-06 14:46       ` Christoph Hellwig
2025-11-11  8:31         ` Hans Holmberg
2025-11-11  9:05           ` hch
2025-11-11  9:50             ` Florian Weimer
2025-11-11 13:40               ` hch
2025-11-06 16:31       ` Florian Weimer
2025-11-06 17:05         ` Christoph Hellwig
2025-11-08 12:30           ` Florian Weimer
2025-11-09 22:15             ` Dave Chinner
2025-11-10  5:27               ` Florian Weimer [this message]
2025-11-10  9:38                 ` Christoph Hellwig
2025-11-10 10:03                   ` Florian Weimer
2025-11-10 20:28                 ` Dave Chinner
2025-11-11  8:56                   ` Christoph Hellwig
2025-11-10  9:37               ` Christoph Hellwig
2025-11-10  9:44                 ` Florian Weimer
2025-11-10 21:33                 ` Dave Chinner
2025-11-11  9:04                   ` Christoph Hellwig
2025-11-11  9:30                   ` Florian Weimer
2025-11-10  9:31             ` Christoph Hellwig
2025-11-10  9:48               ` truncatat? was, " Christoph Hellwig
2025-11-10 10:00                 ` Florian Weimer
2025-11-10  9:49               ` Florian Weimer
2025-11-10  9:52                 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lhuseem1mpe.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=cem@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hans.holmberg@wdc.com \
    --cc=hch@lst.de \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.