From: Matthew Wilcox <willy@infradead.org>
To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
linux-btrfs@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>,
Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>,
linux-erofs@lists.ozlabs.org, Jaegeuk Kim <jaegeuk@kernel.org>,
linux-f2fs-devel@lists.sourceforge.net, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
David Woodhouse <dwmw2@infradead.org>,
Richard Weinberger <richard@nod.at>,
linux-mtd@lists.infradead.org,
David Howells <dhowells@redhat.com>,
netfs@lists.linux.dev, Paulo Alcantara <pc@manguebit.org>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
ntfs3@lists.linux.dev, Steve French <sfrench@samba.org>,
linux-cifs@vger.kernel.org,
Phillip Lougher <phillip@squashfs.org.uk>
Subject: Compressed files & the page cache
Date: Tue, 15 Jul 2025 21:40:42 +0100 [thread overview]
Message-ID: <aHa8ylTh0DGEQklt@casper.infradead.org> (raw)
I've started looking at how the page cache can help filesystems handle
compressed data better. Feedback would be appreciated! I'll probably
say a few things which are obvious to anyone who knows how compressed
files work, but I'm trying to be explicit about my assumptions.
First, I believe that all filesystems work by compressing fixed-size
plaintext into variable-sized compressed blocks. This would be a good
point to stop reading and tell me about counterexamples.
From what I've been reading in all your filesystems is that you want to
allocate extra pages in the page cache in order to store the excess data
retrieved along with the page that you're actually trying to read. That's
because compressing in larger chunks leads to better compression.
There's some discrepancy between filesystems whether you need scratch
space for decompression. Some filesystems read the compressed data into
the pagecache and decompress in-place, while other filesystems read the
compressed data into scratch pages and decompress into the page cache.
There also seems to be some discrepancy between filesystems whether the
decompression involves vmap() of all the memory allocated or whether the
decompression routines can handle doing kmap_local() on individual pages.
So, my proposal is that filesystems tell the page cache that their minimum
folio size is the compression block size. That seems to be around 64k,
so not an unreasonable minimum allocation size. That removes all the
extra code in filesystems to allocate extra memory in the page cache.
It means we don't attempt to track dirtiness at a sub-folio granularity
(there's no point, we have to write back the entire compressed bock
at once). We also get a single virtually contiguous block ... if you're
willing to ditch HIGHMEM support. Or there's a proposal to introduce a
vmap_file() which would give us a virtually contiguous chunk of memory
(and could be trivially turned into a noop for the case of trying to
vmap a single large folio).
WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
linux-btrfs@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>,
Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>,
linux-erofs@lists.ozlabs.org, Jaegeuk Kim <jaegeuk@kernel.org>,
linux-f2fs-devel@lists.sourceforge.net, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
David Woodhouse <dwmw2@infradead.org>,
Richard Weinberger <richard@nod.at>,
linux-mtd@lists.infradead.org,
David Howells <dhowells@redhat.com>,
netfs@lists.linux.dev, Paulo Alcantara <pc@manguebit.org>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
ntfs3@lists.linux.dev, Steve French <sfrench@samba.org>,
linux-cifs@vger.kernel.org,
Phillip Lougher <phillip@squashfs.org.uk>
Subject: [f2fs-dev] Compressed files & the page cache
Date: Tue, 15 Jul 2025 21:40:42 +0100 [thread overview]
Message-ID: <aHa8ylTh0DGEQklt@casper.infradead.org> (raw)
I've started looking at how the page cache can help filesystems handle
compressed data better. Feedback would be appreciated! I'll probably
say a few things which are obvious to anyone who knows how compressed
files work, but I'm trying to be explicit about my assumptions.
First, I believe that all filesystems work by compressing fixed-size
plaintext into variable-sized compressed blocks. This would be a good
point to stop reading and tell me about counterexamples.
From what I've been reading in all your filesystems is that you want to
allocate extra pages in the page cache in order to store the excess data
retrieved along with the page that you're actually trying to read. That's
because compressing in larger chunks leads to better compression.
There's some discrepancy between filesystems whether you need scratch
space for decompression. Some filesystems read the compressed data into
the pagecache and decompress in-place, while other filesystems read the
compressed data into scratch pages and decompress into the page cache.
There also seems to be some discrepancy between filesystems whether the
decompression involves vmap() of all the memory allocated or whether the
decompression routines can handle doing kmap_local() on individual pages.
So, my proposal is that filesystems tell the page cache that their minimum
folio size is the compression block size. That seems to be around 64k,
so not an unreasonable minimum allocation size. That removes all the
extra code in filesystems to allocate extra memory in the page cache.
It means we don't attempt to track dirtiness at a sub-folio granularity
(there's no point, we have to write back the entire compressed bock
at once). We also get a single virtually contiguous block ... if you're
willing to ditch HIGHMEM support. Or there's a proposal to introduce a
vmap_file() which would give us a virtually contiguous chunk of memory
(and could be trivially turned into a noop for the case of trying to
vmap a single large folio).
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
linux-btrfs@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>,
Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>,
linux-erofs@lists.ozlabs.org, Jaegeuk Kim <jaegeuk@kernel.org>,
linux-f2fs-devel@lists.sourceforge.net, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
David Woodhouse <dwmw2@infradead.org>,
Richard Weinberger <richard@nod.at>,
linux-mtd@lists.infradead.org,
David Howells <dhowells@redhat.com>,
netfs@lists.linux.dev, Paulo Alcantara <pc@manguebit.org>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
ntfs3@lists.linux.dev, Steve French <sfrench@samba.org>,
linux-cifs@vger.kernel.org,
Phillip Lougher <phillip@squashfs.org.uk>
Subject: Compressed files & the page cache
Date: Tue, 15 Jul 2025 21:40:42 +0100 [thread overview]
Message-ID: <aHa8ylTh0DGEQklt@casper.infradead.org> (raw)
I've started looking at how the page cache can help filesystems handle
compressed data better. Feedback would be appreciated! I'll probably
say a few things which are obvious to anyone who knows how compressed
files work, but I'm trying to be explicit about my assumptions.
First, I believe that all filesystems work by compressing fixed-size
plaintext into variable-sized compressed blocks. This would be a good
point to stop reading and tell me about counterexamples.
From what I've been reading in all your filesystems is that you want to
allocate extra pages in the page cache in order to store the excess data
retrieved along with the page that you're actually trying to read. That's
because compressing in larger chunks leads to better compression.
There's some discrepancy between filesystems whether you need scratch
space for decompression. Some filesystems read the compressed data into
the pagecache and decompress in-place, while other filesystems read the
compressed data into scratch pages and decompress into the page cache.
There also seems to be some discrepancy between filesystems whether the
decompression involves vmap() of all the memory allocated or whether the
decompression routines can handle doing kmap_local() on individual pages.
So, my proposal is that filesystems tell the page cache that their minimum
folio size is the compression block size. That seems to be around 64k,
so not an unreasonable minimum allocation size. That removes all the
extra code in filesystems to allocate extra memory in the page cache.
It means we don't attempt to track dirtiness at a sub-folio granularity
(there's no point, we have to write back the entire compressed bock
at once). We also get a single virtually contiguous block ... if you're
willing to ditch HIGHMEM support. Or there's a proposal to introduce a
vmap_file() which would give us a virtually contiguous chunk of memory
(and could be trivially turned into a noop for the case of trying to
vmap a single large folio).
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
next reply other threads:[~2025-07-15 20:40 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 20:40 Matthew Wilcox [this message]
2025-07-15 20:40 ` Compressed files & the page cache Matthew Wilcox
2025-07-15 20:40 ` [f2fs-dev] " Matthew Wilcox
2025-07-15 21:22 ` Boris Burkov
2025-07-15 21:22 ` Boris Burkov
2025-07-15 21:22 ` [f2fs-dev] " Boris Burkov
2025-07-15 23:32 ` Gao Xiang
2025-07-15 23:32 ` Gao Xiang
2025-07-15 23:32 ` [f2fs-dev] " Gao Xiang
2025-07-16 0:28 ` Gao Xiang
2025-07-16 0:28 ` Gao Xiang
2025-07-16 0:28 ` [f2fs-dev] " Gao Xiang
2025-07-21 1:02 ` Barry Song
2025-07-21 1:02 ` Barry Song
2025-07-21 1:02 ` [f2fs-dev] " Barry Song
2025-07-21 3:14 ` Gao Xiang
2025-07-21 3:14 ` Gao Xiang
2025-07-21 3:14 ` [f2fs-dev] " Gao Xiang
2025-07-21 10:25 ` Jan Kara
2025-07-21 10:25 ` Jan Kara
2025-07-21 10:25 ` [f2fs-dev] " Jan Kara
2025-07-21 11:36 ` Qu Wenruo
2025-07-21 11:36 ` Qu Wenruo
2025-07-21 11:36 ` [f2fs-dev] " Qu Wenruo via Linux-f2fs-devel
2025-07-21 11:52 ` Gao Xiang
2025-07-21 11:52 ` Gao Xiang
2025-07-21 11:52 ` [f2fs-dev] " Gao Xiang
2025-07-22 3:54 ` Barry Song
2025-07-22 3:54 ` Barry Song
2025-07-22 3:54 ` [f2fs-dev] " Barry Song
2025-07-21 11:40 ` Gao Xiang
2025-07-21 11:40 ` Gao Xiang
2025-07-21 11:40 ` [f2fs-dev] " Gao Xiang
2025-07-21 0:43 ` Barry Song
2025-07-21 0:43 ` Barry Song
2025-07-21 0:43 ` [f2fs-dev] " Barry Song
2025-07-16 0:57 ` Qu Wenruo
2025-07-16 0:57 ` Qu Wenruo
2025-07-16 0:57 ` [f2fs-dev] " Qu Wenruo via Linux-f2fs-devel
2025-07-16 1:16 ` Gao Xiang
2025-07-16 1:16 ` Gao Xiang
2025-07-16 1:16 ` [f2fs-dev] " Gao Xiang
2025-07-16 4:54 ` Qu Wenruo
2025-07-16 4:54 ` Qu Wenruo
2025-07-16 4:54 ` [f2fs-dev] " Qu Wenruo via Linux-f2fs-devel
2025-07-16 5:40 ` Gao Xiang
2025-07-16 5:40 ` Gao Xiang
2025-07-16 5:40 ` [f2fs-dev] " Gao Xiang
2025-07-16 22:37 ` Phillip Lougher
2025-07-16 22:37 ` Phillip Lougher
2025-07-16 22:37 ` [f2fs-dev] " Phillip Lougher
2025-07-17 2:49 ` Eric Biggers
2025-07-17 2:49 ` Eric Biggers
2025-07-17 2:49 ` [f2fs-dev] " Eric Biggers via Linux-f2fs-devel
2025-07-17 3:18 ` Gao Xiang
2025-07-17 3:18 ` Gao Xiang
2025-07-17 3:18 ` [f2fs-dev] " Gao Xiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHa8ylTh0DGEQklt@casper.infradead.org \
--to=willy@infradead.org \
--cc=almaz.alexandrovich@paragon-software.com \
--cc=chao@kernel.org \
--cc=clm@fb.com \
--cc=dhowells@redhat.com \
--cc=dsterba@suse.com \
--cc=dwmw2@infradead.org \
--cc=jack@suse.cz \
--cc=jaegeuk@kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=netfs@lists.linux.dev \
--cc=nico@fluxnic.net \
--cc=ntfs3@lists.linux.dev \
--cc=pc@manguebit.org \
--cc=phillip@squashfs.org.uk \
--cc=richard@nod.at \
--cc=sfrench@samba.org \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.