public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Gao Xiang <hsiangkao@linux.alibaba.com>,
	Matthew Wilcox <willy@infradead.org>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>,
	Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>,
	linux-erofs@lists.ozlabs.org, Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	David Woodhouse <dwmw2@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	linux-mtd@lists.infradead.org,
	David Howells <dhowells@redhat.com>,
	netfs@lists.linux.dev, Paulo Alcantara <pc@manguebit.org>,
	Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
	ntfs3@lists.linux.dev, Steve French <sfrench@samba.org>,
	linux-cifs@vger.kernel.org,
	Phillip Lougher <phillip@squashfs.org.uk>
Subject: Re: Compressed files & the page cache
Date: Wed, 16 Jul 2025 14:24:58 +0930	[thread overview]
Message-ID: <b43fe06d-204b-4f47-a7ff-0c405365bc48@suse.com> (raw)
In-Reply-To: <eeee0704-9e76-4152-bb8e-b5a0e096ec18@linux.alibaba.com>



在 2025/7/16 10:46, Gao Xiang 写道:
> ...
> 
>>
>>>
>>> There's some discrepancy between filesystems whether you need scratch
>>> space for decompression.  Some filesystems read the compressed data into
>>> the pagecache and decompress in-place, while other filesystems read the
>>> compressed data into scratch pages and decompress into the page cache.
>>
>> Btrfs goes the scratch pages way. Decompression in-place looks a 
>> little tricky to me. E.g. what if there is only one compressed page, 
>> and it decompressed to 4 pages.
> 
> Decompression in-place mainly optimizes full decompression (so that CPU
> cache line won't be polluted by temporary buffers either), in fact,
> EROFS supports the hybird way.
> 
>>
>> Won't the plaintext over-write the compressed data halfway?
> 
> Personally I'm very familiar with LZ4, LZMA, and DEFLATE
> algorithm internals, and I also have experience to build LZMA,
> DEFLATE compressors.
> 
> It's totally workable for LZ4, in short it will read the compressed
> data at the end of the decompressed buffers, and the proper margin
> can make this almost always succeed.

I guess that's why btrfs can not go that way.

Due to data COW, we're totally possible to hit a case that we only want 
to read out one single plaintext block from a compressed data extent 
(the compressed size can even be larger than one block).

In that case such in-place decompression will definitely not work.

[...]

>> All the decompression/compression routines all support swapping input/ 
>> output buffer when one of them is full.
>> So kmap_local() is completely feasible.
> 
> I think one of the btrfs supported algorithm LZO is not,

It is, the tricky part is btrfs is implementing its own TLV structure 
for LZO compression.

And btrfs does extra padding to ensure no TLV (compressed data + header) 
structure will cross block boundary.

So btrfs LZO compression is still able to swap out input/output halfway, 
mostly due to the btrfs' specific design.

Thanks,
Qu

> because the
> fastest LZ77-family algorithms like LZ4, LZO just operates on virtual
> consecutive buffers and treat the decompressed buffer as LZ77 sliding
> window.
> 
> So that either you need to allocate another temporary consecutive
> buffer (I believe that is what btrfs does) or use vmap() approach,
> EROFS is interested in the vmap() one.
> 
> Thanks,
> Gao Xiang
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> So, my proposal is that filesystems tell the page cache that their 
>>> minimum
>>> folio size is the compression block size.  That seems to be around 64k,
>>> so not an unreasonable minimum allocation size.  That removes all the
>>> extra code in filesystems to allocate extra memory in the page cache.
>>> It means we don't attempt to track dirtiness at a sub-folio granularity
>>> (there's no point, we have to write back the entire compressed bock
>>> at once).  We also get a single virtually contiguous block ... if you're
>>> willing to ditch HIGHMEM support.  Or there's a proposal to introduce a
>>> vmap_file() which would give us a virtually contiguous chunk of memory
>>> (and could be trivially turned into a noop for the case of trying to
>>> vmap a single large folio).
>>>
>>>
> 


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2025-07-16  4:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-15 20:40 Compressed files & the page cache Matthew Wilcox
2025-07-15 21:22 ` Boris Burkov
2025-07-15 23:32 ` Gao Xiang
2025-07-16  0:28   ` Gao Xiang
2025-07-21  1:02     ` Barry Song
2025-07-21  3:14       ` Gao Xiang
2025-07-21 10:25         ` Jan Kara
2025-07-21 11:36           ` Qu Wenruo
2025-07-21 11:52             ` Gao Xiang
2025-07-22  3:54             ` Barry Song
2025-07-21 11:40           ` Gao Xiang
2025-07-21  0:43   ` Barry Song
2025-07-16  0:57 ` Qu Wenruo
2025-07-16  1:16   ` Gao Xiang
2025-07-16  4:54     ` Qu Wenruo [this message]
2025-07-16  5:40       ` Gao Xiang
2025-07-16 22:37 ` Phillip Lougher
2025-07-17  2:49   ` Eric Biggers
2025-07-17  3:18     ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b43fe06d-204b-4f47-a7ff-0c405365bc48@suse.com \
    --to=wqu@suse.com \
    --cc=almaz.alexandrovich@paragon-software.com \
    --cc=chao@kernel.org \
    --cc=clm@fb.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=dwmw2@infradead.org \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=netfs@lists.linux.dev \
    --cc=nico@fluxnic.net \
    --cc=ntfs3@lists.linux.dev \
    --cc=pc@manguebit.org \
    --cc=phillip@squashfs.org.uk \
    --cc=richard@nod.at \
    --cc=sfrench@samba.org \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox