From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Gerhard Wiesinger <lists@wiesinger.com>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS doesn't compress on the fly
Date: Sun, 3 Dec 2023 08:26:13 +1030 [thread overview]
Message-ID: <6ae85272-3967-417e-bc9a-e2141a4c688a@gmx.com> (raw)
In-Reply-To: <d30abb90-4ab2-4f66-88b0-7d0992b41527@gmx.com>
On 2023/12/3 06:37, Qu Wenruo wrote:
>
>
> On 2023/12/2 22:32, Gerhard Wiesinger wrote:
>> Hello Qu,
>>
>> Thank you for the answers, see inline.
>>
>> Any further ideas?
>>
>> Ciao,
>> Gerhard.
>>
>> On 30.11.2023 21:53, Qu Wenruo wrote:
>>>
>>>
>>> On 2023/11/30 21:51, Gerhard Wiesinger wrote:
>>>> Dear All,
>>>>
>>>> I created a new BTRFS volume with migrating an existing PostgreSQL
>>>> database on it. Versions are recent.
>>>
>>> Does the data base directory has something like NODATACOW or NODATASUM
>>> set?
>>> The other possibility is preallocation, for the first write on
>>> preallocated range, no matter if the compression is enabled, the write
>>> would be treated as NOCOW.
>>>
>> I don't think so. How to find out (googled already a lot)?
>
> I normally go `btrfs ins dump-tree`, dump the subvolume, grep for the
> inode number with `grep -A 3 "item .* key (257 INODE_ITEM 0)"`, which
> would show something like this:
>
> item 6 key (257 INODE_ITEM 0) itemoff 15811 itemsize 160
> generation 7 transid 8 size 4194304 nbytes 4194304
> block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
> sequence 513 flags 0x10(PREALLOC)
>
> The flags is the btrfs specific flags, which would show NODATACOW or
> NODATASUM.
>
>>
>> At least it is not mounted with these options (see also original post).
>>
>> # Mounted via force
>> findmnt -vno OPTIONS /var/lib/pgsql
>> rw,relatime,compress-force=zstd:3,space_cache=v2,subvolid=5,subvol=/'
>>
>> According to the following link it should compress anyway with the -o
>> compress-force option:
>>
>> https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Compression.html#What.27s_the_precedence_of_all_the_options_affecting_compression.3F
>> Compression to newly written data happens:
>> always -- if the filesystem is mounted with -o compress-force
>> never -- if the NOCOMPRESS flag is set per-file/-directory
>> if possible -- if the COMPRESS per-file flag (aka chattr +c) is set, but
>> it may get converted to NOCOMPRESS eventually
>> if possible -- if the -o compress mount option is specified
>> Note, that mounting with -o compress will not set the +c file attribute.
>
> Well, if you check the kernel code, inside btrfs_run_delalloc_range(),
> which calls should_nocow() to check if we should fall to NOCOW path.
>
> That should_nocow() would check if the inode has NODATACOW or PREALLOC
> flags, then verify if there is any defrag request for it.
> If no defrag request, then it can go NOCOW, thus break the COW requirement.
>
>>
> [...]
>>>> # Stays here at this compression level
>>>> compsize -x /var/lib/pgsql
>>>> Processed 5332 files, 575858 regular extents (591204 refs), 40 inline.
>>>> Type Perc Disk Usage Uncompressed Referenced
>>>> TOTAL 63% 51G 80G 80G
>>>> none 100% 40G 40G 40G
>>>> zstd 27% 10G 40G 40G
>>>> prealloc 100% 5.0M 5.0M 5.5M
>>>
>>> Not sure if the preallocation is the cause, but maybe you can try
>>> disabling preallocation of postgresql?
>>>
>>> As preallocation doesn't make that much sense on btrfs, there are too
>>> many cases that can break the preallocation.
>>
>>
>> I googled a lot and didn't find anything useful with preallocation and
>> postgresql (looks like it doesn'use fallocate).
>
> I don't think so.
>
>>
>> How can I find something about preallocation out?
>
> Above compsize is already showing there is some preallocated space.
>
> Thus I'm wondering if the preallocation is the cause.
>
> As should_nocow() would also check the PREALLOC inode flag, and tries
> NOCOW path first (then falls to COW if needed)
Yep, I just reproduced it, for any INODE with PREALLOC flag (aka, the
file has some preallocated range), even we're writing into the range
that needs COW anyway (e.g. new writes which would enlarge the file),
the compression would not work anyway.
# mkfs.btrfs test.img
# mount test.img -o compress-force=zstd /mnt/btrfs
# fallocate -l 128k /mnt/btrfs/file1
# xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file1
# xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file2
# sync
Since file1 has 128K preallocated range, thus the inode has PREALLOC
flag, and would lead to no compression:
item 6 key (257 INODE_ITEM 0) itemoff 15811 itemsize 160
generation 8 transid 8 size 262144 nbytes 262144
block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
sequence 33 flags 0x10(PREALLOC) <<<<
item 7 key (257 INODE_REF 256) itemoff 15796 itemsize 15
index 2 namelen 5 name: file1
item 8 key (257 EXTENT_DATA 0) itemoff 15743 itemsize 53
generation 8 type 2 (prealloc)
prealloc data disk byte 13631488 nr 131072
prealloc data offset 0 nr 131072
item 9 key (257 EXTENT_DATA 131072) itemoff 15690 itemsize 53
generation 8 type 1 (regular)
extent data disk byte 13762560 nr 131072
extent data offset 0 nr 131072 ram 131072
extent compression 0 (none) <<<
Meanwhile for the other file, which has no prealloc, would go regular
compression path.
item 10 key (258 INODE_ITEM 0) itemoff 15530 itemsize 160
generation 8 transid 8 size 262144 nbytes 131072
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 32 flags 0x0(none)
item 11 key (258 INODE_REF 256) itemoff 15515 itemsize 15
index 3 namelen 5 name: file2
item 12 key (258 EXTENT_DATA 131072) itemoff 15462 itemsize 53
generation 8 type 1 (regular)
extent data disk byte 13893632 nr 4096
extent data offset 0 nr 131072 ram 131072
extent compression 3 (zstd)
To me, this looks a bug, and the reason is exactly what I explained before.
The worst thing is, as long as the inode has PREALLOC flag, even if all
preallocated extents are used, it would prevent compression from
happening, forever for that inode.
Let me try to fix the fallback to COW path to include compression.
Thanks,
Qu
>
> Thanks,
> Qu
>
>>
>>
>>
>
next prev parent reply other threads:[~2023-12-02 21:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 11:21 BTRFS doesn't compress on the fly Gerhard Wiesinger
2023-11-30 20:53 ` Qu Wenruo
2023-12-02 12:02 ` Gerhard Wiesinger
2023-12-02 20:07 ` Qu Wenruo
2023-12-02 21:56 ` Qu Wenruo [this message]
2023-12-03 8:24 ` Gerhard Wiesinger
2023-12-03 9:11 ` Qu Wenruo
2023-12-03 9:45 ` Gerhard Wiesinger
2023-12-03 10:19 ` Qu Wenruo
2023-12-22 5:58 ` Gerhard Wiesinger
2023-12-22 6:13 ` Qu Wenruo
2024-08-11 9:39 ` Gerhard Wiesinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ae85272-3967-417e-bc9a-e2141a4c688a@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@wiesinger.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox