public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: JeffleXu <jefflexu@linux.alibaba.com>
To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org,
	chao@kernel.org, linux-erofs@lists.ozlabs.org
Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org,
	willy@infradead.org, linux-fsdevel@vger.kernel.org,
	joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com,
	tao.peng@linux.alibaba.com, gerry@linux.alibaba.com,
	eguan@linux.alibaba.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 00/21] fscache,erofs: fscache-based on-demand read semantics
Date: Fri, 18 Mar 2022 19:48:48 +0800	[thread overview]
Message-ID: <884cbd35-9d88-82a5-972a-39de2f4c8bc0@linux.alibaba.com> (raw)
In-Reply-To: <20220307123305.79520-1-jefflexu@linux.alibaba.com>

Hi David,

We indeed value the fscache based on-demand read feature, and we believe
fscache will benefit more scenarios then. Our community partners are
also quite interested in this feature.

Appreciate if you could take a look on it, and please let me know if you
have any concern.


Thanks.
Jeffle


On 3/7/22 8:32 PM, Jeffle Xu wrote:
> changes since v3:
> - cachefiles: The current implementation relies on the anonymous fd mechanism to avoid
>   the dependence on the format of cache file. When cache file is opened
>   for the first time, an anon_fd associated with the cache file is sent to
>   user daemon. User daemon could fetch and write data to cache file with
>   the given anon_fd. The following write to the anon_fd will finally
>   call to cachefiles kernel module, which will write data to cache file in
>   the latest format of cache file. Thus the on-demand read mode can
>   keep working no matter how cache file format could change in the
>   future. (patch 4)
> - cachefiles: the on-demand read mode reuses the existing
>   "/dev/cachefiles" devnode (patch 3)
> - erofs: squash several commits implementing readahead into single
>   commit (patch 20)
> - erofs: refactor the readahead routine, so that it can read multiple
>   pages each round (patch 20)
> - patch 1 and 7 have already been cherry-picked by the maintainers, but
>   have not been merged to the master. Keep them here for completeness.
> 
> 
> RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
> v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
> v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
> v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
> 
> [Background]
> ============
> Nydus [1] is a container image distribution service specially optimised
> for distribution over network. Nydus is an excellent container image
> acceleration solution, since it only pulls data from remote when it's
> really needed, a.k.a. on-demand reading.
> 
> erofs (Enhanced Read-Only File System) is a filesystem specially
> optimised for read-only scenarios. (Documentation/filesystem/erofs.rst)
> 
> Recently we are focusing on erofs in container images distribution
> scenario [2], trying to combine it with nydus. In this case, erofs can
> be mounted from one bootstrap file (metadata) with (optional) multiple
> data blob files (data) stored on another local filesystem. (All these
> files are actually image files in erofs disk format.)
> 
> To accelerate the container startup (fetching container image from remote
> and then start the container), we do hope that the bootstrap blob file
> could support demand read. That is, erofs can be mounted and accessed
> even when the bootstrap/data blob files have not been fully downloaded.
> 
> That means we have to manage the cache state of the bootstrap/data blob
> files (if cache hit, read directly from the local cache; if cache miss,
> fetch the data somehow). It would be painful and may be dumb for erofs to
> implement the cache management itself. Thus we prefer fscache/cachefiles
> to do the cache management. Besides, the demand-read feature shall be
> general and it can benefit other using scenarios if it can be implemented
> in fscache level.
> 
> [1] https://nydus.dev
> [2] https://sched.co/pcdL
> 
> 
> [Overall Design]
> ================
> 
> Please refer to patch 6 ("cachefiles: document on-demand read mode") for
> more details.
> 
> When working in original mode, cachefiles mainly serves as a local cache for
> remote networking fs, while in on-demand read mode, cachefiles can boost the
> scenario where on-demand read semantics is needed, e.g. container image
> distribution.
> 
> The essential difference between these two modes is that, in original mode,
> when cache miss, netfs itself will fetch data from remote, and then write the
> fetched data into cache file. While in on-demand read mode, a user daemon is
> responsible for fetching data and then writing to the cache file.
> 
> The on-demand read mode relies on a simple protocol used for communication
> between kernel and user daemon.
> 
> The current implementation relies on the anonymous fd mechanism to avoid
> the dependence on the format of cache file. When cache file is opened
> for the first time, an anon_fd associated with the cache file is sent to
> user daemon. With the given anon_fd, user daemon could fetch and write data
> into the cache file in the background, even when kernel has not triggered
> the cache miss. Besides, the write() syscall to the anon_fd will finally
> call cachefiles kernel module, which will write data to cache file in
> the latest format of cache file.
> 
> 1. cache miss
> When cache miss, cachefiles kernel module will notify user daemon the
> anon_fd, along with the requested file range. When notified, user dameon
> needs to fetch data of the requested file range, and then write the fetched
> data into cache file with the given anonymous fd. When finished
> processing the request, user daemon needs to notify the kernel.
> 
> After notifying the user daemon, the kernel read routine will hang there,
> until the request is handled by user daemon. When it's awaken by the
> notification from user daemon, i.e. the corresponding hole has been filled
> by the user daemon, it will retry to read from the same file range.
> 
> 2. cache hit
> Once data is already ready in cache file, netfs will read from cache file directly.
> 
> 
> [Advantage of fscache-based demand-read]
> ========================================
> 1. Asynchronous Prefetch
> In current mechanism, fscache is responsible for cache state management,
> while the data plane (fetch data from local/remote on cache miss) is
> done on the user daemon side.
> 
> If data has already been ready in the backing file, the upper fs (e.g.
> erofs) will read from the backing file directly and won't be trapped to
> user space anymore. Thus the user daemon could fetch data (from remote)
> asynchronously on the background, and thus accelerate the backing file
> accessing in some degree.
> 
> 2. Support massive blob files
> Besides this mechanism supports a large amount of backing files, and
> thus can benefit the densely employed scenario.
> 
> In our using scenario, one container image can correspond to one
> bootstrap file (required) and multiple data blob files (optional). For
> example, one container image for node.js will corresponds to ~20 files
> in total. In densely employed environment, there could be as many as
> hundreds of containers and thus thousands of backing files on one
> machine.
> 
> 
> [Test]
> ==========
> You could start a quick test by
> https://github.com/lostjeffle/demand-read-cachefilesd
> 
> 
> 
> Jeffle Xu (21):
>   fscache: export fscache_end_operation()
>   cachefiles: export write routine
>   cachefiles: introduce on-demand read mode
>   cachefiles: notify user daemon with anon_fd when opening cache file
>   cachefiles: implement on-demand read
>   cachefiles: document on-demand read mode
>   erofs: use meta buffers for erofs_read_superblock()
>   erofs: export erofs_map_blocks()
>   erofs: add mode checking helper
>   erofs: register global fscache volume
>   erofs: add cookie context helper functions
>   erofs: add anonymous inode managing page cache of blob file
>   erofs: add erofs_fscache_read_pages() helper
>   erofs: register cookie context for bootstrap blob
>   erofs: implement fscache-based metadata read
>   erofs: implement fscache-based data read for non-inline layout
>   erofs: implement fscache-based data read for inline layout
>   erofs: register cookie context for data blobs
>   erofs: implement fscache-based data read for data blobs
>   erofs: implement fscache-based data readahead
>   erofs: add 'uuid' mount option
> 
>  .../filesystems/caching/cachefiles.rst        | 159 +++++
>  fs/cachefiles/Kconfig                         |  11 +
>  fs/cachefiles/daemon.c                        | 576 +++++++++++++++++-
>  fs/cachefiles/internal.h                      |  48 ++
>  fs/cachefiles/io.c                            |  72 ++-
>  fs/cachefiles/namei.c                         |  16 +-
>  fs/erofs/Makefile                             |   3 +-
>  fs/erofs/data.c                               |  18 +-
>  fs/erofs/fscache.c                            | 496 +++++++++++++++
>  fs/erofs/inode.c                              |   6 +-
>  fs/erofs/internal.h                           |  30 +
>  fs/erofs/super.c                              | 106 +++-
>  fs/fscache/internal.h                         |  11 -
>  fs/nfs/fscache.c                              |   8 -
>  include/linux/fscache.h                       |  15 +
>  include/linux/netfs.h                         |   1 +
>  include/trace/events/cachefiles.h             |   2 +
>  include/uapi/linux/cachefiles.h               |  48 ++
>  18 files changed, 1526 insertions(+), 100 deletions(-)
>  create mode 100644 fs/erofs/fscache.c
>  create mode 100644 include/uapi/linux/cachefiles.h
> 

-- 
Thanks,
Jeffle

      parent reply	other threads:[~2022-03-18 11:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-07 12:32 [PATCH v4 00/21] fscache,erofs: fscache-based on-demand read semantics Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 01/21] fscache: export fscache_end_operation() Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 02/21] cachefiles: export write routine Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 03/21] cachefiles: introduce on-demand read mode Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 04/21] cachefiles: notify user daemon with anon_fd when opening cache file Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 05/21] cachefiles: implement on-demand read Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 06/21] cachefiles: document on-demand read mode Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 07/21] erofs: use meta buffers for erofs_read_superblock() Jeffle Xu
2022-03-11  7:36   ` Chao Yu
2022-03-07 12:32 ` [PATCH v4 08/21] erofs: export erofs_map_blocks() Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 09/21] erofs: add mode checking helper Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 10/21] erofs: register global fscache volume Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 11/21] erofs: add cookie context helper functions Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 12/21] erofs: add anonymous inode managing page cache of blob file Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 13/21] erofs: add erofs_fscache_read_pages() helper Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 14/21] erofs: register cookie context for bootstrap blob Jeffle Xu
2022-03-07 12:32 ` [PATCH v4 15/21] erofs: implement fscache-based metadata read Jeffle Xu
2022-03-07 12:33 ` [PATCH v4 16/21] erofs: implement fscache-based data read for non-inline layout Jeffle Xu
2022-03-07 12:33 ` [PATCH v4 17/21] erofs: implement fscache-based data read for inline layout Jeffle Xu
2022-03-07 12:33 ` [PATCH v4 18/21] erofs: register cookie context for data blobs Jeffle Xu
2022-03-07 12:33 ` [PATCH v4 19/21] erofs: implement fscache-based data read " Jeffle Xu
2022-03-07 12:33 ` [PATCH v4 20/21] erofs: implement fscache-based data readahead Jeffle Xu
2022-03-07 12:33 ` [PATCH v4 21/21] erofs: add 'uuid' mount option Jeffle Xu
2022-03-14  9:24 ` [PATCH v4 00/21] fscache,erofs: fscache-based on-demand read semantics luodaowen.backend
2022-03-18  9:42 ` Fan,Naihao
2022-03-18 11:48 ` JeffleXu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=884cbd35-9d88-82a5-972a-39de2f4c8bc0@linux.alibaba.com \
    --to=jefflexu@linux.alibaba.com \
    --cc=bo.liu@linux.alibaba.com \
    --cc=chao@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=eguan@linux.alibaba.com \
    --cc=gerry@linux.alibaba.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tao.peng@linux.alibaba.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox