Readahead for compressed data

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	Phillip Lougher <phillip@squashfs.org.uk>,
	linux-erofs@lists.ozlabs.org, linux-btrfs@vger.kernel.org,
	linux-ntfs-dev@lists.sourceforge.net, ntfs3@lists.linux.dev,
	linux-bcache@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>, Hsin-Yi Wang <hsinyi@chromium.org>
Subject: Readahead for compressed data
Date: Thu, 21 Oct 2021 21:17:40 +0100	[thread overview]
Message-ID: <YXHK5HrQpJu9oy8w@casper.infradead.org> (raw)

As far as I can tell, the following filesystems support compressed data:

bcachefs, btrfs, erofs, ntfs, squashfs, zisofs

I'd like to make it easier and more efficient for filesystems to
implement compressed data.  There are a lot of approaches in use today,
but none of them seem quite right to me.  I'm going to lay out a few
design considerations next and then propose a solution.  Feel free to
tell me I've got the constraints wrong, or suggest alternative solutions.

When we call ->readahead from the VFS, the VFS has decided which pages
are going to be the most useful to bring in, but it doesn't know how
pages are bundled together into blocks.  As I've learned from talking to
Gao Xiang, sometimes the filesystem doesn't know either, so this isn't
something we can teach the VFS.

We (David) added readahead_expand() recently to let the filesystem
opportunistically add pages to the page cache "around" the area requested
by the VFS.  That reduces the number of times the filesystem has to
decompress the same block.  But it can fail (due to memory allocation
failures or pages already being present in the cache).  So filesystems
still have to implement some kind of fallback.

For many (all?) compression algorithms (all?) the data must be mapped at
all times.  Calling kmap() and kunmap() would be an intolerable overhead.
At the same time, we cannot write to a page in the page cache which is
marked Uptodate.  It might be mapped into userspace, or a read() be in
progress against it.  For writable filesystems, it might even be dirty!
As far as I know, no compression algorithm supports "holes", implying
that we must allocate memory which is then discarded.

To me, this calls for a vmap() based approach.  So I'm thinking
something like ...

void *readahead_get_block(struct readahead_control *ractl, loff_t start,
			size_t len);
void readahead_put_block(struct readahead_control *ractl, void *addr,
			bool success);

Once you've figured out which bytes this encrypted block expands to, you
call readahead_get_block(), specifying the offset in the file and length
and get back a pointer.  When you're done decompressing that block of
the file, you get rid of it again.

It's the job of readahead_get_block() to allocate additional pages
into the page cache or temporary pages.  readahead_put_block() will
mark page cache pages as Uptodate if 'success' is true, and unlock
them.  It'll free any temporary pages.

Thoughts?  Anyone want to be the guinea pig?  ;-)

next             reply	other threads:[~2021-10-21 20:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 20:17 Matthew Wilcox [this message]
2021-10-22  0:22 ` Readahead for compressed data Gao Xiang
2021-10-22  1:04 ` Phillip Susi
2021-10-22  1:28   ` Gao Xiang
2021-10-22  1:39     ` Gao Xiang
2021-10-22  2:09   ` Phillip Lougher
2021-10-22  2:31     ` Gao Xiang
2021-10-22  8:41   ` Jan Kara
2021-10-22  9:11     ` Gao Xiang
2021-10-22  9:22       ` Qu Wenruo
2021-10-22  9:39         ` Gao Xiang
2021-10-22  9:54           ` Gao Xiang
2021-10-22 10:40             ` Qu Wenruo
2021-10-25 18:59     ` Phillip Susi
2021-10-22  4:36 ` Phillip Lougher
2021-10-29  6:15 ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXHK5HrQpJu9oy8w@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=dhowells@redhat.com \
    --cc=hsinyi@chromium.org \
    --cc=jack@suse.cz \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ntfs-dev@lists.sourceforge.net \
    --cc=ntfs3@lists.linux.dev \
    --cc=phillip@squashfs.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).