* BTRFS doesn't compress on the fly @ 2023-11-30 11:21 Gerhard Wiesinger 2023-11-30 20:53 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Gerhard Wiesinger @ 2023-11-30 11:21 UTC (permalink / raw) To: linux-btrfs Dear All, I created a new BTRFS volume with migrating an existing PostgreSQL database on it. Versions are recent. Compression is not done on the fly although everything is IMHO configured correctly to do so. I need to run the following command that everything gets compressed: btrfs filesystem defragment -r -v -czstd /var/lib/pgsql Had also a problem that chattr -R +c /var/lib/pgsql didn't work for some files. Find further details below. Looks like a bug to me. Any ideas? Thanx. Ciao, Gerhard uname -a Linux myhostname 6.5.12-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov 20 22:44:24 UTC 2023 x86_64 GNU/Linux btrfs --version btrfs-progs v6.5.1 btrfs filesystem show Label: 'database' uuid: 6ad6ef90-30fa-4979-9509-99803f7545aa Total devices 1 FS bytes used 15.76GiB devid 1 size 129.98GiB used 21.06GiB path /dev/mapper/datab btrfs filesystem df /var/lib/pgsql Data, single: total=19.00GiB, used=15.61GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=151.92MiB GlobalReserve, single: total=85.38MiB, used=0.00B # Mounted via force findmnt -vno OPTIONS /var/lib/pgsql rw,relatime,compress-force=zstd:3,space_cache=v2,subvolid=5,subvol=/ # all files even have "c" attribute, set after creation of the filesystem lsattr /var/lib/pgsql --------c------------- /var/lib/pgsql/16 # Should be empty and is empty, so everything has the comressed attribute (after creation and also all new files) lsattr -R /var/lib/pgsql | grep -v "^/" | grep -v "^$" | grep -v "^........c" # Stays here at this compression level compsize -x /var/lib/pgsql Processed 5332 files, 575858 regular extents (591204 refs), 40 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 63% 51G 80G 80G none 100% 40G 40G 40G zstd 27% 10G 40G 40G prealloc 100% 5.0M 5.0M 5.5M # After running: btrfs filesystem defragment -r -v -czstd /var/lib/pgsql compsize -x /var/lib/pgsql Processed 5563 files, 664076 regular extents (664076 refs), 40 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 19% 15G 80G 80G none 100% 120K 120K 120K zstd 19% 15G 80G 80G # At the first time creating the filesystem I had also the problem that I couln't change all attributes, didn't find a way to get rid of this. Any ideas. chattr -R +c /var/lib/pgsql chattr: Invalid argument while setting flags on /var/lib/pgsql/16/data/base/1/2836 chattr: Invalid argument while setting flags on /var/lib/pgsql/16/data/base/1/2840 chattr: Invalid argument while setting flags on /var/lib/pgsql/16/data/base/1/2838 chattr: Invalid argument while setting flags on /var/lib/pgsql/16/data/base/4/2836 chattr: Invalid argument while setting flags on /var/lib/pgsql/16/data/base/4/2838 chattr: Invalid argument while setting flags on /var/lib/pgsql/16/data/base/5/2836 chattr: Invalid argument while setting flags on /var/lib/pgsql/16/data/base/5/2838 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-11-30 11:21 BTRFS doesn't compress on the fly Gerhard Wiesinger @ 2023-11-30 20:53 ` Qu Wenruo 2023-12-02 12:02 ` Gerhard Wiesinger 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2023-11-30 20:53 UTC (permalink / raw) To: Gerhard Wiesinger, linux-btrfs On 2023/11/30 21:51, Gerhard Wiesinger wrote: > Dear All, > > I created a new BTRFS volume with migrating an existing PostgreSQL > database on it. Versions are recent. Does the data base directory has something like NODATACOW or NODATASUM set? The other possibility is preallocation, for the first write on preallocated range, no matter if the compression is enabled, the write would be treated as NOCOW. > > Compression is not done on the fly although everything is IMHO > configured correctly to do so. > > I need to run the following command that everything gets compressed: > btrfs filesystem defragment -r -v -czstd /var/lib/pgsql > > Had also a problem that > chattr -R +c /var/lib/pgsql > didn't work for some files. > > Find further details below. > > Looks like a bug to me. > > Any ideas? > > Thanx. > > Ciao, > Gerhard > > uname -a > Linux myhostname 6.5.12-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov > 20 22:44:24 UTC 2023 x86_64 GNU/Linux > > btrfs --version > btrfs-progs v6.5.1 > > btrfs filesystem show > Label: 'database' uuid: 6ad6ef90-30fa-4979-9509-99803f7545aa > Total devices 1 FS bytes used 15.76GiB > devid 1 size 129.98GiB used 21.06GiB path /dev/mapper/datab > > btrfs filesystem df /var/lib/pgsql > Data, single: total=19.00GiB, used=15.61GiB > System, DUP: total=32.00MiB, used=16.00KiB > Metadata, DUP: total=1.00GiB, used=151.92MiB > GlobalReserve, single: total=85.38MiB, used=0.00B > > # Mounted via force > findmnt -vno OPTIONS /var/lib/pgsql > rw,relatime,compress-force=zstd:3,space_cache=v2,subvolid=5,subvol=/' > > # all files even have "c" attribute, set after creation of the filesystem > lsattr /var/lib/pgsql > --------c------------- /var/lib/pgsql/16 > > # Should be empty and is empty, so everything has the comressed > attribute (after creation and also all new files) > lsattr -R /var/lib/pgsql | grep -v "^/" | grep -v "^$" | grep -v > "^........c" > > # Stays here at this compression level > compsize -x /var/lib/pgsql > Processed 5332 files, 575858 regular extents (591204 refs), 40 inline. > Type Perc Disk Usage Uncompressed Referenced > TOTAL 63% 51G 80G 80G > none 100% 40G 40G 40G > zstd 27% 10G 40G 40G > prealloc 100% 5.0M 5.0M 5.5M Not sure if the preallocation is the cause, but maybe you can try disabling preallocation of postgresql? As preallocation doesn't make that much sense on btrfs, there are too many cases that can break the preallocation. > > # After running: btrfs filesystem defragment -r -v -czstd /var/lib/pgsql > compsize -x /var/lib/pgsql > Processed 5563 files, 664076 regular extents (664076 refs), 40 inline. > Type Perc Disk Usage Uncompressed Referenced > TOTAL 19% 15G 80G 80G > none 100% 120K 120K 120K > zstd 19% 15G 80G 80G > > # At the first time creating the filesystem I had also the problem that > I couln't change all attributes, didn't find a way to get rid of this. > Any ideas. > chattr -R +c /var/lib/pgsql > chattr: Invalid argument while setting flags on A lot of flags can only be set on empty files IIRC. Thanks, Qu > /var/lib/pgsql/16/data/base/1/2836 > chattr: Invalid argument while setting flags on > /var/lib/pgsql/16/data/base/1/2840 > chattr: Invalid argument while setting flags on > /var/lib/pgsql/16/data/base/1/2838 > chattr: Invalid argument while setting flags on > /var/lib/pgsql/16/data/base/4/2836 > chattr: Invalid argument while setting flags on > /var/lib/pgsql/16/data/base/4/2838 > chattr: Invalid argument while setting flags on > /var/lib/pgsql/16/data/base/5/2836 > chattr: Invalid argument while setting flags on > /var/lib/pgsql/16/data/base/5/2838 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-11-30 20:53 ` Qu Wenruo @ 2023-12-02 12:02 ` Gerhard Wiesinger 2023-12-02 20:07 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Gerhard Wiesinger @ 2023-12-02 12:02 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs Hello Qu, Thank you for the answers, see inline. Any further ideas? Ciao, Gerhard. On 30.11.2023 21:53, Qu Wenruo wrote: > > > On 2023/11/30 21:51, Gerhard Wiesinger wrote: >> Dear All, >> >> I created a new BTRFS volume with migrating an existing PostgreSQL >> database on it. Versions are recent. > > Does the data base directory has something like NODATACOW or NODATASUM > set? > The other possibility is preallocation, for the first write on > preallocated range, no matter if the compression is enabled, the write > would be treated as NOCOW. > I don't think so. How to find out (googled already a lot)? At least it is not mounted with these options (see also original post). # Mounted via force findmnt -vno OPTIONS /var/lib/pgsql rw,relatime,compress-force=zstd:3,space_cache=v2,subvolid=5,subvol=/' According to the following link it should compress anyway with the -o compress-force option: https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Compression.html#What.27s_the_precedence_of_all_the_options_affecting_compression.3F Compression to newly written data happens: always -- if the filesystem is mounted with -o compress-force never -- if the NOCOMPRESS flag is set per-file/-directory if possible -- if the COMPRESS per-file flag (aka chattr +c) is set, but it may get converted to NOCOMPRESS eventually if possible -- if the -o compress mount option is specified Note, that mounting with -o compress will not set the +c file attribute. >> >> Compression is not done on the fly although everything is IMHO >> configured correctly to do so. >> >> I need to run the following command that everything gets compressed: >> btrfs filesystem defragment -r -v -czstd /var/lib/pgsql >> >> Had also a problem that >> chattr -R +c /var/lib/pgsql >> didn't work for some files. >> >> Find further details below. >> >> Looks like a bug to me. >> >> Any ideas? >> >> Thanx. >> >> Ciao, >> Gerhard >> >> uname -a >> Linux myhostname 6.5.12-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov >> 20 22:44:24 UTC 2023 x86_64 GNU/Linux >> >> btrfs --version >> btrfs-progs v6.5.1 >> >> btrfs filesystem show >> Label: 'database' uuid: 6ad6ef90-30fa-4979-9509-99803f7545aa >> Total devices 1 FS bytes used 15.76GiB >> devid 1 size 129.98GiB used 21.06GiB path /dev/mapper/datab >> >> btrfs filesystem df /var/lib/pgsql >> Data, single: total=19.00GiB, used=15.61GiB >> System, DUP: total=32.00MiB, used=16.00KiB >> Metadata, DUP: total=1.00GiB, used=151.92MiB >> GlobalReserve, single: total=85.38MiB, used=0.00B >> >> # Mounted via force >> findmnt -vno OPTIONS /var/lib/pgsql >> rw,relatime,compress-force=zstd:3,space_cache=v2,subvolid=5,subvol=/' >> >> # all files even have "c" attribute, set after creation of the >> filesystem >> lsattr /var/lib/pgsql >> --------c------------- /var/lib/pgsql/16 >> >> # Should be empty and is empty, so everything has the comressed >> attribute (after creation and also all new files) >> lsattr -R /var/lib/pgsql | grep -v "^/" | grep -v "^$" | grep -v >> "^........c" >> >> # Stays here at this compression level >> compsize -x /var/lib/pgsql >> Processed 5332 files, 575858 regular extents (591204 refs), 40 inline. >> Type Perc Disk Usage Uncompressed Referenced >> TOTAL 63% 51G 80G 80G >> none 100% 40G 40G 40G >> zstd 27% 10G 40G 40G >> prealloc 100% 5.0M 5.0M 5.5M > > Not sure if the preallocation is the cause, but maybe you can try > disabling preallocation of postgresql? > > As preallocation doesn't make that much sense on btrfs, there are too > many cases that can break the preallocation. I googled a lot and didn't find anything useful with preallocation and postgresql (looks like it doesn'use fallocate). How can I find something about preallocation out? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-02 12:02 ` Gerhard Wiesinger @ 2023-12-02 20:07 ` Qu Wenruo 2023-12-02 21:56 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2023-12-02 20:07 UTC (permalink / raw) To: Gerhard Wiesinger, linux-btrfs On 2023/12/2 22:32, Gerhard Wiesinger wrote: > Hello Qu, > > Thank you for the answers, see inline. > > Any further ideas? > > Ciao, > Gerhard. > > On 30.11.2023 21:53, Qu Wenruo wrote: >> >> >> On 2023/11/30 21:51, Gerhard Wiesinger wrote: >>> Dear All, >>> >>> I created a new BTRFS volume with migrating an existing PostgreSQL >>> database on it. Versions are recent. >> >> Does the data base directory has something like NODATACOW or NODATASUM >> set? >> The other possibility is preallocation, for the first write on >> preallocated range, no matter if the compression is enabled, the write >> would be treated as NOCOW. >> > I don't think so. How to find out (googled already a lot)? I normally go `btrfs ins dump-tree`, dump the subvolume, grep for the inode number with `grep -A 3 "item .* key (257 INODE_ITEM 0)"`, which would show something like this: item 6 key (257 INODE_ITEM 0) itemoff 15811 itemsize 160 generation 7 transid 8 size 4194304 nbytes 4194304 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 sequence 513 flags 0x10(PREALLOC) The flags is the btrfs specific flags, which would show NODATACOW or NODATASUM. > > At least it is not mounted with these options (see also original post). > > # Mounted via force > findmnt -vno OPTIONS /var/lib/pgsql > rw,relatime,compress-force=zstd:3,space_cache=v2,subvolid=5,subvol=/' > > According to the following link it should compress anyway with the -o > compress-force option: > > https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Compression.html#What.27s_the_precedence_of_all_the_options_affecting_compression.3F > Compression to newly written data happens: > always -- if the filesystem is mounted with -o compress-force > never -- if the NOCOMPRESS flag is set per-file/-directory > if possible -- if the COMPRESS per-file flag (aka chattr +c) is set, but > it may get converted to NOCOMPRESS eventually > if possible -- if the -o compress mount option is specified > Note, that mounting with -o compress will not set the +c file attribute. Well, if you check the kernel code, inside btrfs_run_delalloc_range(), which calls should_nocow() to check if we should fall to NOCOW path. That should_nocow() would check if the inode has NODATACOW or PREALLOC flags, then verify if there is any defrag request for it. If no defrag request, then it can go NOCOW, thus break the COW requirement. > [...] >>> # Stays here at this compression level >>> compsize -x /var/lib/pgsql >>> Processed 5332 files, 575858 regular extents (591204 refs), 40 inline. >>> Type Perc Disk Usage Uncompressed Referenced >>> TOTAL 63% 51G 80G 80G >>> none 100% 40G 40G 40G >>> zstd 27% 10G 40G 40G >>> prealloc 100% 5.0M 5.0M 5.5M >> >> Not sure if the preallocation is the cause, but maybe you can try >> disabling preallocation of postgresql? >> >> As preallocation doesn't make that much sense on btrfs, there are too >> many cases that can break the preallocation. > > > I googled a lot and didn't find anything useful with preallocation and > postgresql (looks like it doesn'use fallocate). I don't think so. > > How can I find something about preallocation out? Above compsize is already showing there is some preallocated space. Thus I'm wondering if the preallocation is the cause. As should_nocow() would also check the PREALLOC inode flag, and tries NOCOW path first (then falls to COW if needed) Thanks, Qu > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-02 20:07 ` Qu Wenruo @ 2023-12-02 21:56 ` Qu Wenruo 2023-12-03 8:24 ` Gerhard Wiesinger 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2023-12-02 21:56 UTC (permalink / raw) To: Gerhard Wiesinger, linux-btrfs On 2023/12/3 06:37, Qu Wenruo wrote: > > > On 2023/12/2 22:32, Gerhard Wiesinger wrote: >> Hello Qu, >> >> Thank you for the answers, see inline. >> >> Any further ideas? >> >> Ciao, >> Gerhard. >> >> On 30.11.2023 21:53, Qu Wenruo wrote: >>> >>> >>> On 2023/11/30 21:51, Gerhard Wiesinger wrote: >>>> Dear All, >>>> >>>> I created a new BTRFS volume with migrating an existing PostgreSQL >>>> database on it. Versions are recent. >>> >>> Does the data base directory has something like NODATACOW or NODATASUM >>> set? >>> The other possibility is preallocation, for the first write on >>> preallocated range, no matter if the compression is enabled, the write >>> would be treated as NOCOW. >>> >> I don't think so. How to find out (googled already a lot)? > > I normally go `btrfs ins dump-tree`, dump the subvolume, grep for the > inode number with `grep -A 3 "item .* key (257 INODE_ITEM 0)"`, which > would show something like this: > > item 6 key (257 INODE_ITEM 0) itemoff 15811 itemsize 160 > generation 7 transid 8 size 4194304 nbytes 4194304 > block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 > sequence 513 flags 0x10(PREALLOC) > > The flags is the btrfs specific flags, which would show NODATACOW or > NODATASUM. > >> >> At least it is not mounted with these options (see also original post). >> >> # Mounted via force >> findmnt -vno OPTIONS /var/lib/pgsql >> rw,relatime,compress-force=zstd:3,space_cache=v2,subvolid=5,subvol=/' >> >> According to the following link it should compress anyway with the -o >> compress-force option: >> >> https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Compression.html#What.27s_the_precedence_of_all_the_options_affecting_compression.3F >> Compression to newly written data happens: >> always -- if the filesystem is mounted with -o compress-force >> never -- if the NOCOMPRESS flag is set per-file/-directory >> if possible -- if the COMPRESS per-file flag (aka chattr +c) is set, but >> it may get converted to NOCOMPRESS eventually >> if possible -- if the -o compress mount option is specified >> Note, that mounting with -o compress will not set the +c file attribute. > > Well, if you check the kernel code, inside btrfs_run_delalloc_range(), > which calls should_nocow() to check if we should fall to NOCOW path. > > That should_nocow() would check if the inode has NODATACOW or PREALLOC > flags, then verify if there is any defrag request for it. > If no defrag request, then it can go NOCOW, thus break the COW requirement. > >> > [...] >>>> # Stays here at this compression level >>>> compsize -x /var/lib/pgsql >>>> Processed 5332 files, 575858 regular extents (591204 refs), 40 inline. >>>> Type Perc Disk Usage Uncompressed Referenced >>>> TOTAL 63% 51G 80G 80G >>>> none 100% 40G 40G 40G >>>> zstd 27% 10G 40G 40G >>>> prealloc 100% 5.0M 5.0M 5.5M >>> >>> Not sure if the preallocation is the cause, but maybe you can try >>> disabling preallocation of postgresql? >>> >>> As preallocation doesn't make that much sense on btrfs, there are too >>> many cases that can break the preallocation. >> >> >> I googled a lot and didn't find anything useful with preallocation and >> postgresql (looks like it doesn'use fallocate). > > I don't think so. > >> >> How can I find something about preallocation out? > > Above compsize is already showing there is some preallocated space. > > Thus I'm wondering if the preallocation is the cause. > > As should_nocow() would also check the PREALLOC inode flag, and tries > NOCOW path first (then falls to COW if needed) Yep, I just reproduced it, for any INODE with PREALLOC flag (aka, the file has some preallocated range), even we're writing into the range that needs COW anyway (e.g. new writes which would enlarge the file), the compression would not work anyway. # mkfs.btrfs test.img # mount test.img -o compress-force=zstd /mnt/btrfs # fallocate -l 128k /mnt/btrfs/file1 # xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file1 # xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file2 # sync Since file1 has 128K preallocated range, thus the inode has PREALLOC flag, and would lead to no compression: item 6 key (257 INODE_ITEM 0) itemoff 15811 itemsize 160 generation 8 transid 8 size 262144 nbytes 262144 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 sequence 33 flags 0x10(PREALLOC) <<<< item 7 key (257 INODE_REF 256) itemoff 15796 itemsize 15 index 2 namelen 5 name: file1 item 8 key (257 EXTENT_DATA 0) itemoff 15743 itemsize 53 generation 8 type 2 (prealloc) prealloc data disk byte 13631488 nr 131072 prealloc data offset 0 nr 131072 item 9 key (257 EXTENT_DATA 131072) itemoff 15690 itemsize 53 generation 8 type 1 (regular) extent data disk byte 13762560 nr 131072 extent data offset 0 nr 131072 ram 131072 extent compression 0 (none) <<< Meanwhile for the other file, which has no prealloc, would go regular compression path. item 10 key (258 INODE_ITEM 0) itemoff 15530 itemsize 160 generation 8 transid 8 size 262144 nbytes 131072 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 32 flags 0x0(none) item 11 key (258 INODE_REF 256) itemoff 15515 itemsize 15 index 3 namelen 5 name: file2 item 12 key (258 EXTENT_DATA 131072) itemoff 15462 itemsize 53 generation 8 type 1 (regular) extent data disk byte 13893632 nr 4096 extent data offset 0 nr 131072 ram 131072 extent compression 3 (zstd) To me, this looks a bug, and the reason is exactly what I explained before. The worst thing is, as long as the inode has PREALLOC flag, even if all preallocated extents are used, it would prevent compression from happening, forever for that inode. Let me try to fix the fallback to COW path to include compression. Thanks, Qu > > Thanks, > Qu > >> >> >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-02 21:56 ` Qu Wenruo @ 2023-12-03 8:24 ` Gerhard Wiesinger 2023-12-03 9:11 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Gerhard Wiesinger @ 2023-12-03 8:24 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 02.12.2023 22:56, Qu Wenruo wrote: >> >>> >>> How can I find something about preallocation out? >> >> Above compsize is already showing there is some preallocated space. >> >> Thus I'm wondering if the preallocation is the cause. >> >> As should_nocow() would also check the PREALLOC inode flag, and tries >> NOCOW path first (then falls to COW if needed) > > Yep, I just reproduced it, for any INODE with PREALLOC flag (aka, the > file has some preallocated range), even we're writing into the range > that needs COW anyway (e.g. new writes which would enlarge the file), > the compression would not work anyway. > > # mkfs.btrfs test.img > # mount test.img -o compress-force=zstd /mnt/btrfs > # fallocate -l 128k /mnt/btrfs/file1 > # xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file1 > # xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file2 > # sync > > Since file1 has 128K preallocated range, thus the inode has PREALLOC > flag, and would lead to no compression: > > item 6 key (257 INODE_ITEM 0) itemoff 15811 itemsize 160 > generation 8 transid 8 size 262144 nbytes 262144 > block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 > sequence 33 flags 0x10(PREALLOC) <<<< > item 7 key (257 INODE_REF 256) itemoff 15796 itemsize 15 > index 2 namelen 5 name: file1 > item 8 key (257 EXTENT_DATA 0) itemoff 15743 itemsize 53 > generation 8 type 2 (prealloc) > prealloc data disk byte 13631488 nr 131072 > prealloc data offset 0 nr 131072 > item 9 key (257 EXTENT_DATA 131072) itemoff 15690 itemsize 53 > generation 8 type 1 (regular) > extent data disk byte 13762560 nr 131072 > extent data offset 0 nr 131072 ram 131072 > extent compression 0 (none) <<< > > Meanwhile for the other file, which has no prealloc, would go regular > compression path. > > item 10 key (258 INODE_ITEM 0) itemoff 15530 itemsize 160 > generation 8 transid 8 size 262144 nbytes 131072 > block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 > sequence 32 flags 0x0(none) > item 11 key (258 INODE_REF 256) itemoff 15515 itemsize 15 > index 3 namelen 5 name: file2 > item 12 key (258 EXTENT_DATA 131072) itemoff 15462 itemsize 53 > generation 8 type 1 (regular) > extent data disk byte 13893632 nr 4096 > extent data offset 0 nr 131072 ram 131072 > extent compression 3 (zstd) > > To me, this looks a bug, and the reason is exactly what I explained > before. > > The worst thing is, as long as the inode has PREALLOC flag, even if all > preallocated extents are used, it would prevent compression from > happening, forever for that inode. > > Let me try to fix the fallback to COW path to include compression. Thank you for reproducting it. Think we nailed it down. Is there a way to get the output of the file of the chunks/items? Thnx. Ciao, Gerhard ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-03 8:24 ` Gerhard Wiesinger @ 2023-12-03 9:11 ` Qu Wenruo 2023-12-03 9:45 ` Gerhard Wiesinger 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2023-12-03 9:11 UTC (permalink / raw) To: Gerhard Wiesinger, linux-btrfs On 2023/12/3 18:54, Gerhard Wiesinger wrote: > On 02.12.2023 22:56, Qu Wenruo wrote: >>> >>>> >>>> How can I find something about preallocation out? >>> >>> Above compsize is already showing there is some preallocated space. >>> >>> Thus I'm wondering if the preallocation is the cause. >>> >>> As should_nocow() would also check the PREALLOC inode flag, and tries >>> NOCOW path first (then falls to COW if needed) >> >> Yep, I just reproduced it, for any INODE with PREALLOC flag (aka, the >> file has some preallocated range), even we're writing into the range >> that needs COW anyway (e.g. new writes which would enlarge the file), >> the compression would not work anyway. >> >> # mkfs.btrfs test.img >> # mount test.img -o compress-force=zstd /mnt/btrfs >> # fallocate -l 128k /mnt/btrfs/file1 >> # xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file1 >> # xfs_io -f -c "pwrite 128k 128k" /mnt/btrfs/file2 >> # sync >> >> Since file1 has 128K preallocated range, thus the inode has PREALLOC >> flag, and would lead to no compression: >> >> item 6 key (257 INODE_ITEM 0) itemoff 15811 itemsize 160 >> generation 8 transid 8 size 262144 nbytes 262144 >> block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 >> sequence 33 flags 0x10(PREALLOC) <<<< >> item 7 key (257 INODE_REF 256) itemoff 15796 itemsize 15 >> index 2 namelen 5 name: file1 >> item 8 key (257 EXTENT_DATA 0) itemoff 15743 itemsize 53 >> generation 8 type 2 (prealloc) >> prealloc data disk byte 13631488 nr 131072 >> prealloc data offset 0 nr 131072 >> item 9 key (257 EXTENT_DATA 131072) itemoff 15690 itemsize 53 >> generation 8 type 1 (regular) >> extent data disk byte 13762560 nr 131072 >> extent data offset 0 nr 131072 ram 131072 >> extent compression 0 (none) <<< >> >> Meanwhile for the other file, which has no prealloc, would go regular >> compression path. >> >> item 10 key (258 INODE_ITEM 0) itemoff 15530 itemsize 160 >> generation 8 transid 8 size 262144 nbytes 131072 >> block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 >> sequence 32 flags 0x0(none) >> item 11 key (258 INODE_REF 256) itemoff 15515 itemsize 15 >> index 3 namelen 5 name: file2 >> item 12 key (258 EXTENT_DATA 131072) itemoff 15462 itemsize 53 >> generation 8 type 1 (regular) >> extent data disk byte 13893632 nr 4096 >> extent data offset 0 nr 131072 ram 131072 >> extent compression 3 (zstd) >> >> To me, this looks a bug, and the reason is exactly what I explained >> before. >> >> The worst thing is, as long as the inode has PREALLOC flag, even if all >> preallocated extents are used, it would prevent compression from >> happening, forever for that inode. >> >> Let me try to fix the fallback to COW path to include compression. > > > Thank you for reproducting it. Think we nailed it down. > > Is there a way to get the output of the file of the chunks/items? You can always dump the full subvolume (`btrfs ins dump-tree -t <subvolid> <device>`), then try to grep the inode which has PREALLOC alloc (`| grep -C 5 "flags.*PREALLOC"), which would include the inode number, then you can ping down the inodes which has PREALLOC flags and not undergoing compression. I won't be surprised most (if not all) files of postgresql would have that flag. Thanks, Qu > > Thnx. > > Ciao, > > Gerhard > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-03 9:11 ` Qu Wenruo @ 2023-12-03 9:45 ` Gerhard Wiesinger 2023-12-03 10:19 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Gerhard Wiesinger @ 2023-12-03 9:45 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 03.12.2023 10:11, Qu Wenruo wrote: > >> >> Thank you for reproducting it. Think we nailed it down. >> >> Is there a way to get the output of the file of the chunks/items? > > You can always dump the full subvolume (`btrfs ins dump-tree -t > <subvolid> <device>`), then try to grep the inode which has PREALLOC > alloc (`| grep -C 5 "flags.*PREALLOC"), which would include the inode > number, then you can ping down the inodes which has PREALLOC flags and > not undergoing compression. > > I won't be surprised most (if not all) files of postgresql would have > that flag. Looks like only a small part has PREALLOC: find /var/lib/pgsql -type f | wc -l 5569 btrfs inspect-internal dump-tree /dev/mapper/datab | grep -i PREALLOC | wc -l 95 For reference: How to find the file at a certain btrfs inode https://serverfault.com/questions/746938/how-to-find-the-file-at-a-certain-btrfs-inode btrfs inspect-internal inode-resolve 13269 /var/lib/pgsql /var/lib/pgsql/16/data/base/16400/16419 find /var/lib/pgsql -xdev -inum 13269 /var/lib/pgsql/16/data/base/16400/16419 # Get files from inodes btrfs inspect-internal dump-tree /dev/mapper/datab | grep -C 5 "flags.*PREALLOC" | grep -i INODE | perl -pe 's/.*?\((.*?) .*/$1/' | sort | uniq | while read INODE; do echo -n "$INODE: ";btrfs inspect-internal inode-resolve ${INODE} /var/lib/pgsql; done # Number of inodes, count is consistent btrfs inspect-internal dump-tree /dev/mapper/datab | grep -C 5 "flags.*PREALLOC" | grep -i INODE | perl -pe 's/.*?\((.*?) .*/$1/' | sort | uniq | while read INODE; do echo -n "$INODE: ";btrfs inspect-internal inode-resolve ${INODE} /var/lib/pgsql; done | wc -l 95 All files are in subdirectories: /var/lib/pgsql/16/data/base/ Already an idea for the fix? BTW: if compression is forced, should be then just any "block" be compressed? Or, what's the problem of the logic? Thnx. Ciao, Gerhard ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-03 9:45 ` Gerhard Wiesinger @ 2023-12-03 10:19 ` Qu Wenruo 2023-12-22 5:58 ` Gerhard Wiesinger 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2023-12-03 10:19 UTC (permalink / raw) To: Gerhard Wiesinger, linux-btrfs On 2023/12/3 20:15, Gerhard Wiesinger wrote: > On 03.12.2023 10:11, Qu Wenruo wrote: >> >>> >>> Thank you for reproducting it. Think we nailed it down. >>> >>> Is there a way to get the output of the file of the chunks/items? >> >> You can always dump the full subvolume (`btrfs ins dump-tree -t >> <subvolid> <device>`), then try to grep the inode which has PREALLOC >> alloc (`| grep -C 5 "flags.*PREALLOC"), which would include the inode >> number, then you can ping down the inodes which has PREALLOC flags and >> not undergoing compression. >> >> I won't be surprised most (if not all) files of postgresql would have >> that flag. > > Looks like only a small part has PREALLOC: > > find /var/lib/pgsql -type f | wc -l > 5569 > > btrfs inspect-internal dump-tree /dev/mapper/datab | grep -i PREALLOC | > wc -l > 95 > > For reference: > > How to find the file at a certain btrfs inode > https://serverfault.com/questions/746938/how-to-find-the-file-at-a-certain-btrfs-inode > > btrfs inspect-internal inode-resolve 13269 /var/lib/pgsql > /var/lib/pgsql/16/data/base/16400/16419 > > find /var/lib/pgsql -xdev -inum 13269 > /var/lib/pgsql/16/data/base/16400/16419 > > # Get files from inodes > > btrfs inspect-internal dump-tree /dev/mapper/datab | grep -C 5 > "flags.*PREALLOC" | grep -i INODE | perl -pe 's/.*?\((.*?) .*/$1/' | > sort | uniq | while read INODE; do echo -n "$INODE: ";btrfs > inspect-internal inode-resolve ${INODE} /var/lib/pgsql; done > > # Number of inodes, count is consistent > > btrfs inspect-internal dump-tree /dev/mapper/datab | grep -C 5 > "flags.*PREALLOC" | grep -i INODE | perl -pe 's/.*?\((.*?) .*/$1/' | > sort | uniq | while read INODE; do echo -n "$INODE: ";btrfs > inspect-internal inode-resolve ${INODE} /var/lib/pgsql; done | wc -l > > 95 > > All files are in subdirectories: /var/lib/pgsql/16/data/base/ > > Already an idea for the fix? We can copy the files (without using reflink) to a temporary location (better out of btrfs), then copy the temporary copy back to overwrite all the existing files. The problem is still inside pgsql, as long as they do preallocation, the same problem would still happen. > > BTW: > > if compression is forced, should be then just any "block" be compressed? There is a long existing problem with compression with preallocation. One easy example is, if we go compression for the preallocated range, what we do with the gap (compressed size is always smaller than the real size). If we leave the gap, then the read performance can be even worse, as now we have to read several small extents with gaps between them, vs a large contig read. IIRC years ago when I was a btrfs newbie, that's the direction I tried to go, but never reached upstream. Thus you can see some of the reason why we do not go compression for preallocated range. But I still don't believe we should go as the current behavior. We should still try to go compression as long as we know the write still needs COW, thus we should fix it. Thanks, Qu > > Or, what's the problem of the logic? > > Thnx. > > Ciao, > > Gerhard > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-03 10:19 ` Qu Wenruo @ 2023-12-22 5:58 ` Gerhard Wiesinger 2023-12-22 6:13 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Gerhard Wiesinger @ 2023-12-22 5:58 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 03.12.2023 11:19, Qu Wenruo wrote: > > >> BTW: >> >> if compression is forced, should be then just any "block" be compressed? > > There is a long existing problem with compression with preallocation. > > One easy example is, if we go compression for the preallocated range, > what we do with the gap (compressed size is always smaller than the real > size). > > If we leave the gap, then the read performance can be even worse, as now > we have to read several small extents with gaps between them, vs a large > contig read. > > IIRC years ago when I was a btrfs newbie, that's the direction I tried > to go, but never reached upstream. > > Thus you can see some of the reason why we do not go compression for > preallocated range. > > But I still don't believe we should go as the current behavior. > We should still try to go compression as long as we know the write still > needs COW, thus we should fix it. Any progress with the fix? Thnx. Ciao, Gerhard ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-22 5:58 ` Gerhard Wiesinger @ 2023-12-22 6:13 ` Qu Wenruo 2024-08-11 9:39 ` Gerhard Wiesinger 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2023-12-22 6:13 UTC (permalink / raw) To: Gerhard Wiesinger, linux-btrfs On 2023/12/22 16:28, Gerhard Wiesinger wrote: > On 03.12.2023 11:19, Qu Wenruo wrote: >> >> >>> BTW: >>> >>> if compression is forced, should be then just any "block" be compressed? >> >> There is a long existing problem with compression with preallocation. >> >> One easy example is, if we go compression for the preallocated range, >> what we do with the gap (compressed size is always smaller than the real >> size). >> >> If we leave the gap, then the read performance can be even worse, as now >> we have to read several small extents with gaps between them, vs a large >> contig read. >> >> IIRC years ago when I was a btrfs newbie, that's the direction I tried >> to go, but never reached upstream. >> >> Thus you can see some of the reason why we do not go compression for >> preallocated range. >> >> But I still don't believe we should go as the current behavior. >> We should still try to go compression as long as we know the write still >> needs COW, thus we should fix it. > > > Any progress with the fix? Tried several solution, the best one would still lead to reserved space underflow. The proper fix would introduce some larger changes to the whole delalloc mechanism. Thus it's not something can be easily fixed yet. Thanks, Qu > > Thnx. > > Ciao, > > Gerhard > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTRFS doesn't compress on the fly 2023-12-22 6:13 ` Qu Wenruo @ 2024-08-11 9:39 ` Gerhard Wiesinger 0 siblings, 0 replies; 12+ messages in thread From: Gerhard Wiesinger @ 2024-08-11 9:39 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 22.12.2023 07:13, Qu Wenruo wrote: > > > On 2023/12/22 16:28, Gerhard Wiesinger wrote: >> On 03.12.2023 11:19, Qu Wenruo wrote: >>> >>> >>>> BTW: >>>> >>>> if compression is forced, should be then just any "block" be >>>> compressed? >>> >>> There is a long existing problem with compression with preallocation. >>> >>> One easy example is, if we go compression for the preallocated range, >>> what we do with the gap (compressed size is always smaller than the >>> real >>> size). >>> >>> If we leave the gap, then the read performance can be even worse, as >>> now >>> we have to read several small extents with gaps between them, vs a >>> large >>> contig read. >>> >>> IIRC years ago when I was a btrfs newbie, that's the direction I tried >>> to go, but never reached upstream. >>> >>> Thus you can see some of the reason why we do not go compression for >>> preallocated range. >>> >>> But I still don't believe we should go as the current behavior. >>> We should still try to go compression as long as we know the write >>> still >>> needs COW, thus we should fix it. >> >> >> Any progress with the fix? > > Tried several solution, the best one would still lead to reserved space > underflow. > > The proper fix would introduce some larger changes to the whole delalloc > mechanism. > > Thus it's not something can be easily fixed yet. > > Thanks, > Qu Any update on the issue or plans to fix it? Thanks. Ciao, Gerhard ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-08-11 9:46 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-30 11:21 BTRFS doesn't compress on the fly Gerhard Wiesinger 2023-11-30 20:53 ` Qu Wenruo 2023-12-02 12:02 ` Gerhard Wiesinger 2023-12-02 20:07 ` Qu Wenruo 2023-12-02 21:56 ` Qu Wenruo 2023-12-03 8:24 ` Gerhard Wiesinger 2023-12-03 9:11 ` Qu Wenruo 2023-12-03 9:45 ` Gerhard Wiesinger 2023-12-03 10:19 ` Qu Wenruo 2023-12-22 5:58 ` Gerhard Wiesinger 2023-12-22 6:13 ` Qu Wenruo 2024-08-11 9:39 ` Gerhard Wiesinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox