public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	 Matthew Wilcox <willy@infradead.org>,
	Hans Holmberg <hans.holmberg@wdc.com>,
	 linux-xfs@vger.kernel.org, Carlos Maiolino <cem@kernel.org>,
	 "Darrick J . Wong" <djwong@kernel.org>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 libc-alpha@sourceware.org
Subject: Re: [RFC] xfs: fake fallocate success for always CoW inodes
Date: Mon, 10 Nov 2025 06:27:41 +0100	[thread overview]
Message-ID: <lhuseem1mpe.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <aRESlvWf9VquNzx3@dread.disaster.area> (Dave Chinner's message of "Mon, 10 Nov 2025 09:15:50 +1100")

* Dave Chinner:

> On Sat, Nov 08, 2025 at 01:30:18PM +0100, Florian Weimer wrote:
>> * Christoph Hellwig:
>> 
>> > On Thu, Nov 06, 2025 at 05:31:28PM +0100, Florian Weimer wrote:
>> >> It's been a few years, I think, and maybe we should drop the allocation
>> >> logic from posix_fallocate in glibc?  Assuming that it's implemented
>> >> everywhere it makes sense?
>> >
>> > I really think it should go away.  If it turns out we find cases where
>> > it was useful we can try to implement a zeroing fallocate in the kernel
>> > for the file system where people want it.
>
> This is what the shiny new FALLOC_FL_WRITE_ZEROS command is supposed
> to provide. We don't have widepsread support in filesystems for it
> yet, though.
>
>> > gfs2 for example currently
>> > has such an implementation, and we could have somewhat generic library
>> > version of it.
>
> Yup, seems like a iomap iter loop would be pretty trivial to
> abstract from that...
>
>> Sorry, I remember now where this got stuck the last time.
>> 
>> This program:
>> 
>> #include <fcntl.h>
>> #include <stddef.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>> 
>> int
>> main(void)
>> {
>>   FILE *fp = tmpfile();
>>   if (fp == NULL)
>>     abort();
>>   int fd = fileno(fp);
>>   posix_fallocate(fd, 0, 1);
>>   char *p = mmap(NULL, 1, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>   *p = 1;
>> }
>> 
>> should not crash even if the file system does not support fallocate.
>
> I think that's buggy application code.
>
> Failing to check the return value of a library call that documents
> EOPNOTSUPP as a valid error is a bug. IOWs, the above code *should*
> SIGBUS on the mmap access, because it failed to verify that the file
> extension operation actually worked.

Sorry, I made the example confusing.

How would the application deal with failure due to lack of fallocate
support?  It would have to do a pwrite, like posix_fallocate does to
today, or maybe ftruncate.  This is way I think removing the fallback
from posix_fallocate completely is mostly pointless.

>> I hope we can agree on that.  I expect avoiding SIGBUS errors because
>> of insufficient file size is a common use case for posix_fallocate.
>> This use is not really an optimization, it's required to get mmap
>> working properly.
>> 
>> If we can get an fallocate mode that we can use as a fallback to
>> increase the file size with a zero flag argument, we can definitely
>
> The fallocate() API already support that, in two different ways:
> FALLOC_FL_ZERO_RANGE and FALLOC_FL_WRITE_ZEROS.

Neither is appropriate for posix_fallocate because they are as
destructive as the existing fallback.

> But, again, not all filesystems support these, so userspace has to
> be prepared to receive -EOPNOTSUPP from these calls. Hence userspace
> has to do the right thing for posix_fallocate() if you want to
> ensure that it always extend the file size even when fallocate()
> calls fail...

Sure, but eventually, we may get into a better situation.

>> use that in posix_fallocate (replacing the fallback path on kernels
>> that support it).  All local file systems should be able to implement
>> that (but perhaps not efficiently).  Basically, what we need here is a
>> non-destructive ftruncate.
>
> You aren't going to get support for such new commands on existing
> kernels, so userspace is still going to have to code the ftruncate()
> fallback itself for the desired behaviour to be provided
> consistently to applications.
>
> As such, I don't see any reason for the fallocate() syscall
> providing some whacky "ftruncate() in all but name" mode.

Please reconsider.  If we start fixing this, we'll eventually be in a
position where the glibc fallback code never runs.

Thanks,
Florian


  reply	other threads:[~2025-11-10  5:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-06 13:35 [RFC] xfs: fake fallocate success for always CoW inodes Hans Holmberg
2025-11-06 13:48 ` Florian Weimer
2025-11-06 13:52   ` Christoph Hellwig
2025-11-06 14:42     ` Matthew Wilcox
2025-11-06 14:46       ` Christoph Hellwig
2025-11-11  8:31         ` Hans Holmberg
2025-11-11  9:05           ` hch
2025-11-11  9:50             ` Florian Weimer
2025-11-11 13:40               ` hch
2025-11-06 16:31       ` Florian Weimer
2025-11-06 17:05         ` Christoph Hellwig
2025-11-08 12:30           ` Florian Weimer
2025-11-09 22:15             ` Dave Chinner
2025-11-10  5:27               ` Florian Weimer [this message]
2025-11-10  9:38                 ` Christoph Hellwig
2025-11-10 10:03                   ` Florian Weimer
2025-11-10 20:28                 ` Dave Chinner
2025-11-11  8:56                   ` Christoph Hellwig
2025-11-10  9:37               ` Christoph Hellwig
2025-11-10  9:44                 ` Florian Weimer
2025-11-10 21:33                 ` Dave Chinner
2025-11-11  9:04                   ` Christoph Hellwig
2025-11-11  9:30                   ` Florian Weimer
2025-11-10  9:31             ` Christoph Hellwig
2025-11-10  9:48               ` truncatat? was, " Christoph Hellwig
2025-11-10 10:00                 ` Florian Weimer
2025-11-10  9:49               ` Florian Weimer
2025-11-10  9:52                 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lhuseem1mpe.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=cem@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hans.holmberg@wdc.com \
    --cc=hch@lst.de \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox