From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Dimitrios Apostolou <jimis@gmx.net>
Cc: Gerhard Wiesinger <lists@wiesinger.com>, linux-btrfs@vger.kernel.org
Subject: Re: mount compress=zstd leaves files uncompressed, that used to compress well with before
Date: Fri, 28 Mar 2025 06:51:42 +1030 [thread overview]
Message-ID: <b9f7b83d-5efa-4906-9df3-a27f399162fb@gmx.com> (raw)
In-Reply-To: <ba2a850f-6697-7555-baa4-32bc6bf62f81@gmx.net>
在 2025/3/28 00:10, Dimitrios Apostolou 写道:
> On Thu, 27 Mar 2025, Qu Wenruo wrote:
>>
>> 在 2025/3/26 22:45, Dimitrios Apostolou 写道:
>>>
>>> Can't the solution/workaround be way more simple, or stupid even?
>>>
>>> * Either have fallocate(2) return EOPNOTSUPP on a force-compress
>>> filesystem, and leave the work-around to userspace,
>>
>> Unfortunately fallocate has higher priority, not vise-verse.
>>
>> In most cases, compression is a good to have feature, but even with
>> force-compression, we can still have cases that won't be compressed.
>
> Do you know of other cases besides fallocate?
/dev/urandom or something similar, those kind of data will result the
compressed data to be larger than the original, and btrfs will abort
compression no matter the mount option.
>
>>
>> On the other hand, all major upstream fses have support for fallocate,
>> and although I understand preallocation is no longer as simple as
>> non-COW filesystems, not supporting it would still be a big surprise to
>> a lot of user space tools.
>
> I checked what openzfs does, and here is an excerpt from the commit
> message that added support for fallocate:
>
> Since ZFS does COW and snapshows, preallocating blocks for a file
> cannot guarantee that writes to the file will not run out of space.
> Instead, make a best-effort attempt to check that at least enough
> space is currently available in the pool (12% margin), then create
> a sparse file of the requested size and continue on with life.
>
> The whole commit with some discussion is at [1], while a long issue
> discussing alternative is at [2].
>
> [1] https://github.com/openzfs/zfs/pull/10408
> [2] https://github.com/openzfs/zfs/issues/326
>
> It could be the solution for btrfs too, to just check if such space plus a
> margin is available and return a sparse file. We lie to userspace about
> guaranteeing that write() can't fail, but as you mentioned, we are already
> lying:
In that case, I'd prefer to return EOPNOTSUPP for fallocate, not even
try to emulate the behavior like ZFS.
At least we have one more fs showing how bad fallocate is on a COW fs.
>
>> Not that easily either. Fallocate itself should mean the next write into
>> the fallocated range will not fail with ENOSPC.
>>
>> Although that assumption itself is no longer correct on btrfs, (e.g.
>> fallocate, then snapshot).
>
> Anyway,
>
>>
>> Although emotionally I agree with you. Fallocation on btrfs is just
>> looking for extra problems, and if I have the final call, I will be more
>> than happier to nuke fallocation support.
>
> From a purist's perspective I also find EOPNOTSUPP as the best solution.
>
> * Better for the kernel: no complicated workarounds, no lies to userspace
>
> * Better for the application: it gets to know that there are no guarantees
> on space allocation
>
> * Better for the admin: the files get compressed as the mount options
> mandate
>
> The only disadvantage I see is breaking the applications that don't
> implement fallback code to {posix_,}fallocate() returning
> EOPNOTSUPP/EINVAL.
> I have to ask here, is posix_fallocate() mandated by some standard?
> If not, it's an application bug.
Nope, it's not a hard requirement, in fact some older fses (still
supported upstream) are not supporting fallocate at all.
E.g. Ext2 doesn't support fallocate.
But suddenly dropping one feature which we originally support, is a
little concerning.
>
> Maybe the best tradeoff is to add a mount option fallocate=off.
That will be feasible.
I can try push that direction after you have updated the docs.
>
>>
>>>
>>> * or fill up the holes with compressed zeros, basically implementing
>>> the
>>> work-around in kernelspace. I suspect this would be very cheap in a
>>> deduplicating filesystem like btrfs, since all the zero-filled
>>> compressed extents are essentially identical.
>>
>>
>> But doing compressed zeros means we got nothing from the old
>> preallocation behavior, and still waste space on holes.
>
> I might be misunderstanding the terminology. I thought a "hole" is one
> block or extent of zeros. If that's one block referenced (deduplicated)
> multiple times, then there is no space wasted, right? It's just a lie:
> btrfs allocated no space for the hole.
Oh, in that case, a hole really means a hole, there is no space taken on
disk, and all the zero are just filled at read time.
Thus there is no compressed or non-compressed hole, it's really a hole,
void.
And in that case I guess you mean making fallocate fallback to hole
punching (for unallocated range).
Which is still not as good as EOPNOTSUPP IHMO.
Thanks,
Qu
>
>
> Thank you,
> Dimitris
next prev parent reply other threads:[~2025-03-27 20:21 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-19 17:07 mount compress=zstd leaves files uncompressed, that used to compress well with before Dimitrios Apostolou
2025-03-24 14:14 ` Dimitrios Apostolou
2025-03-24 18:18 ` Gerhard Wiesinger
2025-03-24 19:11 ` Dimitrios Apostolou
2025-03-24 19:32 ` Gerhard Wiesinger
2025-03-24 20:29 ` Dimitrios Apostolou
2025-03-26 15:19 ` Dimitrios Apostolou
2025-03-26 12:15 ` Dimitrios Apostolou
2025-03-26 23:23 ` Qu Wenruo
2025-03-27 13:40 ` Dimitrios Apostolou
2025-03-27 20:21 ` Qu Wenruo [this message]
2025-03-28 13:41 ` Dimitrios Apostolou
2025-04-02 14:30 ` Dimitrios Apostolou
2025-04-02 21:00 ` Qu Wenruo
2025-04-04 1:23 ` Dimitrios Apostolou
2025-04-04 4:03 ` Qu Wenruo
2025-04-04 17:17 ` Dimitrios Apostolou
2025-04-04 21:41 ` Qu Wenruo
2025-04-08 13:18 ` Dimitrios Apostolou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b9f7b83d-5efa-4906-9df3-a27f399162fb@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=jimis@gmx.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@wiesinger.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox