Re: Compressed files & the page cache

public inbox for ntfs3@lists.linux.dev
 help / color / mirror / Atom feed

From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Qu Wenruo <wqu@suse.com>, Matthew Wilcox <willy@infradead.org>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>,
	Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>,
	linux-erofs@lists.ozlabs.org, Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	David Woodhouse <dwmw2@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	linux-mtd@lists.infradead.org,
	David Howells <dhowells@redhat.com>,
	netfs@lists.linux.dev, Paulo Alcantara <pc@manguebit.org>,
	Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
	ntfs3@lists.linux.dev, Steve French <sfrench@samba.org>,
	linux-cifs@vger.kernel.org,
	Phillip Lougher <phillip@squashfs.org.uk>
Subject: Re: Compressed files & the page cache
Date: Wed, 16 Jul 2025 13:40:02 +0800	[thread overview]
Message-ID: <e143f730-6ae7-491e-985e-cc021411edd8@linux.alibaba.com> (raw)
In-Reply-To: <b43fe06d-204b-4f47-a7ff-0c405365bc48@suse.com>

On 2025/7/16 12:54, Qu Wenruo wrote:
> 
> 
> 在 2025/7/16 10:46, Gao Xiang 写道:
>> ...
>>
>>>
>>>>
>>>> There's some discrepancy between filesystems whether you need scratch
>>>> space for decompression.  Some filesystems read the compressed data into
>>>> the pagecache and decompress in-place, while other filesystems read the
>>>> compressed data into scratch pages and decompress into the page cache.
>>>
>>> Btrfs goes the scratch pages way. Decompression in-place looks a little tricky to me. E.g. what if there is only one compressed page, and it decompressed to 4 pages.
>>
>> Decompression in-place mainly optimizes full decompression (so that CPU
>> cache line won't be polluted by temporary buffers either), in fact,
>> EROFS supports the hybird way.
>>
>>>
>>> Won't the plaintext over-write the compressed data halfway?
>>
>> Personally I'm very familiar with LZ4, LZMA, and DEFLATE
>> algorithm internals, and I also have experience to build LZMA,
>> DEFLATE compressors.
>>
>> It's totally workable for LZ4, in short it will read the compressed
>> data at the end of the decompressed buffers, and the proper margin
>> can make this almost always succeed.
> 
> I guess that's why btrfs can not go that way.
> 
> Due to data COW, we're totally possible to hit a case that we only want to read out one single plaintext block from a compressed data extent (the compressed size can even be larger than one block).
> 
> In that case such in-place decompression will definitely not work.

Ok, I think it's mainly due to btrfs compression design.  Another point
is that decompression inplace can also be used for multi-shot interfaces
(as you said, "swapping input/ output buffer when one of them is full")
like deflate, lzma and zstd. Because you can know when the decompressed
buffers and compressed buffers are overlapped since APIs are multi-shot,
and only copy the overlapped compressed data to some additional temprary
buffers (and they can be shared among multiple compressed extents).

It has less overhead than allocating temporary buffers to keep compressed
data during the whole I/O process (again, because it just uses very small
number buffers during decompression process), especially for slow (even
network) storage devices.

I do understand Btrfs may not consider this because of different target
users, but one of EROFS main use cases is low overhead decompression
under the memory pressure (maybe + cheap storage), LZ4 + inplace
decompression is useful.

Anyway, I'm not advocating inplace decompression in any case.  I think
unlike plain text, encoded data has various approaches to organize
on disk and utilize page cache.  Due to different on-disk design and
target users, there will be different usage mode.

As for EROFS, we already natively supports compressed large folios
since 6.11, and order-0 folio is always our use cases, so I don't
think this will give extra benefits to users.

> 
> [...]
> 
>>> All the decompression/compression routines all support swapping input/ output buffer when one of them is full.
>>> So kmap_local() is completely feasible.
>>
>> I think one of the btrfs supported algorithm LZO is not,
> 
> It is, the tricky part is btrfs is implementing its own TLV structure for LZO compression.
> 
> And btrfs does extra padding to ensure no TLV (compressed data + header) structure will cross block boundary.
> 
> So btrfs LZO compression is still able to swap out input/output halfway, mostly due to the btrfs' specific design.

Ok, it seems much like a btrfs-specific design, because it's much
like per-block compression for LZO instead, and it will increase
the compressed size, I know btrfs may not care, but it's not the
EROFS case anyway.

Thanks,
Gao Xiang

> 
> Thanks,
> Qu

next prev parent reply	other threads:[~2025-07-16  5:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-15 20:40 Compressed files & the page cache Matthew Wilcox
2025-07-15 21:22 ` Boris Burkov
2025-07-15 23:32 ` Gao Xiang
2025-07-16  0:28   ` Gao Xiang
2025-07-21  1:02     ` Barry Song
2025-07-21  3:14       ` Gao Xiang
2025-07-21 10:25         ` Jan Kara
2025-07-21 11:36           ` Qu Wenruo
2025-07-21 11:52             ` Gao Xiang
2025-07-22  3:54             ` Barry Song
2025-07-21 11:40           ` Gao Xiang
2025-07-21  0:43   ` Barry Song
2025-07-16  0:57 ` Qu Wenruo
2025-07-16  1:16   ` Gao Xiang
2025-07-16  4:54     ` Qu Wenruo
2025-07-16  5:40       ` Gao Xiang [this message]
2025-07-16 22:37 ` Phillip Lougher
2025-07-17  2:49   ` Eric Biggers
2025-07-17  3:18     ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e143f730-6ae7-491e-985e-cc021411edd8@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=almaz.alexandrovich@paragon-software.com \
    --cc=chao@kernel.org \
    --cc=clm@fb.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=dwmw2@infradead.org \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=netfs@lists.linux.dev \
    --cc=nico@fluxnic.net \
    --cc=ntfs3@lists.linux.dev \
    --cc=pc@manguebit.org \
    --cc=phillip@squashfs.org.uk \
    --cc=richard@nod.at \
    --cc=sfrench@samba.org \
    --cc=willy@infradead.org \
    --cc=wqu@suse.com \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox