From: Eric Biggers <ebiggers@kernel.org>
To: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>, Chris Mason <clm@fb.com>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
linux-btrfs@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>,
Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>,
linux-erofs@lists.ozlabs.org, Jaegeuk Kim <jaegeuk@kernel.org>,
linux-f2fs-devel@lists.sourceforge.net, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
David Woodhouse <dwmw2@infradead.org>,
Richard Weinberger <richard@nod.at>,
linux-mtd@lists.infradead.org,
David Howells <dhowells@redhat.com>,
netfs@lists.linux.dev, Paulo Alcantara <pc@manguebit.org>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
ntfs3@lists.linux.dev, Steve French <sfrench@samba.org>,
linux-cifs@vger.kernel.org
Subject: Re: Compressed files & the page cache
Date: Wed, 16 Jul 2025 19:49:03 -0700 [thread overview]
Message-ID: <20250717024903.GA1288@sol> (raw)
In-Reply-To: <f4b9faf9-8efd-4396-b080-e712025825ab@squashfs.org.uk>
On Wed, Jul 16, 2025 at 11:37:28PM +0100, Phillip Lougher wrote:
> > There also seems to be some discrepancy between filesystems whether the
> > decompression involves vmap() of all the memory allocated or whether the
> > decompression routines can handle doing kmap_local() on individual pages.
> >
>
> Squashfs does both, and this depends on whether the decompression
> algorithm implementation in the kernel is multi-shot or single-shot.
>
> The zlib/xz/zstd decompressors are multi-shot, in that you can call them
> multiply, giving them an extra input or output buffer when it runs out.
> This means you can get them to output into a 4K page at a time, without
> requiring the pages to be contiguous. kmap_local() can be called on each
> page before passing it to the decompressor.
While those compression libraries do provide streaming APIs, it's sort
of an illusion. They still need the uncompressed data in a virtually
contiguous buffer for the LZ77 match finding and copying to work. So,
internally they copy the uncompressed data into a virtually contiguous
buffer. I suspect that vmap() (or vm_map_ram() which is what f2fs uses)
is actually more efficient than these streaming APIs, since it avoids
the internal copy. But it would need to be measured.
> > So, my proposal is that filesystems tell the page cache that their minimum
> > folio size is the compression block size. That seems to be around 64k,
> > so not an unreasonable minimum allocation size. That removes all the
> > extra code in filesystems to allocate extra memory in the page cache.
> > It means we don't attempt to track dirtiness at a sub-folio granularity
> > (there's no point, we have to write back the entire compressed bock
> > at once). We also get a single virtually contiguous block ... if you're
> > willing to ditch HIGHMEM support. Or there's a proposal to introduce a
> > vmap_file() which would give us a virtually contiguous chunk of memory
> > (and could be trivially turned into a noop for the case of trying to
> > vmap a single large folio).
... but of course, if we could get a virtually contiguous buffer
"for free" (at least in the !HIGHMEM case) as in the above proposal,
that would clearly be the best option.
- Eric
WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>
To: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Jan Kara <jack@suse.cz>, Paulo Alcantara <pc@manguebit.org>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
Chris Mason <clm@fb.com>,
linux-mtd@lists.infradead.org, linux-cifs@vger.kernel.org,
Richard Weinberger <richard@nod.at>,
Matthew Wilcox <willy@infradead.org>,
Gao Xiang <xiang@kernel.org>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
David Howells <dhowells@redhat.com>,
Nicolas Pitre <nico@fluxnic.net>,
David Woodhouse <dwmw2@infradead.org>,
linux-f2fs-devel@lists.sourceforge.net,
Steve French <sfrench@samba.org>,
linux-fsdevel@vger.kernel.org, netfs@lists.linux.dev,
ntfs3@lists.linux.dev, linux-erofs@lists.ozlabs.org,
linux-btrfs@vger.kernel.org
Subject: Re: [f2fs-dev] Compressed files & the page cache
Date: Wed, 16 Jul 2025 19:49:03 -0700 [thread overview]
Message-ID: <20250717024903.GA1288@sol> (raw)
In-Reply-To: <f4b9faf9-8efd-4396-b080-e712025825ab@squashfs.org.uk>
On Wed, Jul 16, 2025 at 11:37:28PM +0100, Phillip Lougher wrote:
> > There also seems to be some discrepancy between filesystems whether the
> > decompression involves vmap() of all the memory allocated or whether the
> > decompression routines can handle doing kmap_local() on individual pages.
> >
>
> Squashfs does both, and this depends on whether the decompression
> algorithm implementation in the kernel is multi-shot or single-shot.
>
> The zlib/xz/zstd decompressors are multi-shot, in that you can call them
> multiply, giving them an extra input or output buffer when it runs out.
> This means you can get them to output into a 4K page at a time, without
> requiring the pages to be contiguous. kmap_local() can be called on each
> page before passing it to the decompressor.
While those compression libraries do provide streaming APIs, it's sort
of an illusion. They still need the uncompressed data in a virtually
contiguous buffer for the LZ77 match finding and copying to work. So,
internally they copy the uncompressed data into a virtually contiguous
buffer. I suspect that vmap() (or vm_map_ram() which is what f2fs uses)
is actually more efficient than these streaming APIs, since it avoids
the internal copy. But it would need to be measured.
> > So, my proposal is that filesystems tell the page cache that their minimum
> > folio size is the compression block size. That seems to be around 64k,
> > so not an unreasonable minimum allocation size. That removes all the
> > extra code in filesystems to allocate extra memory in the page cache.
> > It means we don't attempt to track dirtiness at a sub-folio granularity
> > (there's no point, we have to write back the entire compressed bock
> > at once). We also get a single virtually contiguous block ... if you're
> > willing to ditch HIGHMEM support. Or there's a proposal to introduce a
> > vmap_file() which would give us a virtually contiguous chunk of memory
> > (and could be trivially turned into a noop for the case of trying to
> > vmap a single large folio).
... but of course, if we could get a virtually contiguous buffer
"for free" (at least in the !HIGHMEM case) as in the above proposal,
that would clearly be the best option.
- Eric
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@kernel.org>
To: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>, Chris Mason <clm@fb.com>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
linux-btrfs@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>,
Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>,
linux-erofs@lists.ozlabs.org, Jaegeuk Kim <jaegeuk@kernel.org>,
linux-f2fs-devel@lists.sourceforge.net, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
David Woodhouse <dwmw2@infradead.org>,
Richard Weinberger <richard@nod.at>,
linux-mtd@lists.infradead.org,
David Howells <dhowells@redhat.com>,
netfs@lists.linux.dev, Paulo Alcantara <pc@manguebit.org>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
ntfs3@lists.linux.dev, Steve French <sfrench@samba.org>,
linux-cifs@vger.kernel.org
Subject: Re: Compressed files & the page cache
Date: Wed, 16 Jul 2025 19:49:03 -0700 [thread overview]
Message-ID: <20250717024903.GA1288@sol> (raw)
In-Reply-To: <f4b9faf9-8efd-4396-b080-e712025825ab@squashfs.org.uk>
On Wed, Jul 16, 2025 at 11:37:28PM +0100, Phillip Lougher wrote:
> > There also seems to be some discrepancy between filesystems whether the
> > decompression involves vmap() of all the memory allocated or whether the
> > decompression routines can handle doing kmap_local() on individual pages.
> >
>
> Squashfs does both, and this depends on whether the decompression
> algorithm implementation in the kernel is multi-shot or single-shot.
>
> The zlib/xz/zstd decompressors are multi-shot, in that you can call them
> multiply, giving them an extra input or output buffer when it runs out.
> This means you can get them to output into a 4K page at a time, without
> requiring the pages to be contiguous. kmap_local() can be called on each
> page before passing it to the decompressor.
While those compression libraries do provide streaming APIs, it's sort
of an illusion. They still need the uncompressed data in a virtually
contiguous buffer for the LZ77 match finding and copying to work. So,
internally they copy the uncompressed data into a virtually contiguous
buffer. I suspect that vmap() (or vm_map_ram() which is what f2fs uses)
is actually more efficient than these streaming APIs, since it avoids
the internal copy. But it would need to be measured.
> > So, my proposal is that filesystems tell the page cache that their minimum
> > folio size is the compression block size. That seems to be around 64k,
> > so not an unreasonable minimum allocation size. That removes all the
> > extra code in filesystems to allocate extra memory in the page cache.
> > It means we don't attempt to track dirtiness at a sub-folio granularity
> > (there's no point, we have to write back the entire compressed bock
> > at once). We also get a single virtually contiguous block ... if you're
> > willing to ditch HIGHMEM support. Or there's a proposal to introduce a
> > vmap_file() which would give us a virtually contiguous chunk of memory
> > (and could be trivially turned into a noop for the case of trying to
> > vmap a single large folio).
... but of course, if we could get a virtually contiguous buffer
"for free" (at least in the !HIGHMEM case) as in the above proposal,
that would clearly be the best option.
- Eric
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
next prev parent reply other threads:[~2025-07-17 2:49 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 20:40 Compressed files & the page cache Matthew Wilcox
2025-07-15 20:40 ` Matthew Wilcox
2025-07-15 20:40 ` [f2fs-dev] " Matthew Wilcox
2025-07-15 21:22 ` Boris Burkov
2025-07-15 21:22 ` Boris Burkov
2025-07-15 21:22 ` [f2fs-dev] " Boris Burkov
2025-07-15 23:32 ` Gao Xiang
2025-07-15 23:32 ` Gao Xiang
2025-07-15 23:32 ` [f2fs-dev] " Gao Xiang
2025-07-16 0:28 ` Gao Xiang
2025-07-16 0:28 ` Gao Xiang
2025-07-16 0:28 ` [f2fs-dev] " Gao Xiang
2025-07-21 1:02 ` Barry Song
2025-07-21 1:02 ` Barry Song
2025-07-21 1:02 ` [f2fs-dev] " Barry Song
2025-07-21 3:14 ` Gao Xiang
2025-07-21 3:14 ` Gao Xiang
2025-07-21 3:14 ` [f2fs-dev] " Gao Xiang
2025-07-21 10:25 ` Jan Kara
2025-07-21 10:25 ` Jan Kara
2025-07-21 10:25 ` [f2fs-dev] " Jan Kara
2025-07-21 11:36 ` Qu Wenruo
2025-07-21 11:36 ` Qu Wenruo
2025-07-21 11:36 ` [f2fs-dev] " Qu Wenruo via Linux-f2fs-devel
2025-07-21 11:52 ` Gao Xiang
2025-07-21 11:52 ` Gao Xiang
2025-07-21 11:52 ` [f2fs-dev] " Gao Xiang
2025-07-22 3:54 ` Barry Song
2025-07-22 3:54 ` Barry Song
2025-07-22 3:54 ` [f2fs-dev] " Barry Song
2025-07-21 11:40 ` Gao Xiang
2025-07-21 11:40 ` Gao Xiang
2025-07-21 11:40 ` [f2fs-dev] " Gao Xiang
2025-07-21 0:43 ` Barry Song
2025-07-21 0:43 ` Barry Song
2025-07-21 0:43 ` [f2fs-dev] " Barry Song
2025-07-16 0:57 ` Qu Wenruo
2025-07-16 0:57 ` Qu Wenruo
2025-07-16 0:57 ` [f2fs-dev] " Qu Wenruo via Linux-f2fs-devel
2025-07-16 1:16 ` Gao Xiang
2025-07-16 1:16 ` Gao Xiang
2025-07-16 1:16 ` [f2fs-dev] " Gao Xiang
2025-07-16 4:54 ` Qu Wenruo
2025-07-16 4:54 ` Qu Wenruo
2025-07-16 4:54 ` [f2fs-dev] " Qu Wenruo via Linux-f2fs-devel
2025-07-16 5:40 ` Gao Xiang
2025-07-16 5:40 ` Gao Xiang
2025-07-16 5:40 ` [f2fs-dev] " Gao Xiang
2025-07-16 22:37 ` Phillip Lougher
2025-07-16 22:37 ` Phillip Lougher
2025-07-16 22:37 ` [f2fs-dev] " Phillip Lougher
2025-07-17 2:49 ` Eric Biggers [this message]
2025-07-17 2:49 ` Eric Biggers
2025-07-17 2:49 ` [f2fs-dev] " Eric Biggers via Linux-f2fs-devel
2025-07-17 3:18 ` Gao Xiang
2025-07-17 3:18 ` Gao Xiang
2025-07-17 3:18 ` [f2fs-dev] " Gao Xiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250717024903.GA1288@sol \
--to=ebiggers@kernel.org \
--cc=almaz.alexandrovich@paragon-software.com \
--cc=chao@kernel.org \
--cc=clm@fb.com \
--cc=dhowells@redhat.com \
--cc=dsterba@suse.com \
--cc=dwmw2@infradead.org \
--cc=jack@suse.cz \
--cc=jaegeuk@kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=netfs@lists.linux.dev \
--cc=nico@fluxnic.net \
--cc=ntfs3@lists.linux.dev \
--cc=pc@manguebit.org \
--cc=phillip@squashfs.org.uk \
--cc=richard@nod.at \
--cc=sfrench@samba.org \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.