From: Chao Yu <chao@kernel.org>
To: Jeffle Xu <jefflexu@linux.alibaba.com>,
dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org,
linux-erofs@lists.ozlabs.org
Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org,
willy@infradead.org, linux-fsdevel@vger.kernel.org,
joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com,
tao.peng@linux.alibaba.com, gerry@linux.alibaba.com,
eguan@linux.alibaba.com, linux-kernel@vger.kernel.org,
luodaowen.backend@bytedance.com, tianzichen@kuaishou.com,
yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com,
zhujia.zj@bytedance.com
Subject: Re: [PATCH v11 00/22] fscache,erofs: fscache-based on-demand read semantics
Date: Tue, 10 May 2022 22:14:17 +0800 [thread overview]
Message-ID: <4c788c74-67c1-c0c4-83a0-7ec7a4b95fba@kernel.org> (raw)
In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com>
On 2022/5/9 15:40, Jeffle Xu wrote:
> changes since v10:
> - rebase to 5.18-rc5
> - append the patchset with a patch from Xin Yin, implementing the
> asynchronous readahead (patch 22)
>
>
> Kernel Patchset
> ---------------
> Git tree:
>
> https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v11
>
> Gitweb:
>
> https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v11
>
>
> User Guide for E2E Container Use Case
> -------------------------------------
> User guide:
>
> https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
>
> Video:
>
> https://youtu.be/F4IF2_DENXo
>
>
> User Daemon for Quick Test
> --------------------------
> Git tree:
>
> https://github.com/lostjeffle/demand-read-cachefilesd.git main
>
> Gitweb:
>
> https://github.com/lostjeffle/demand-read-cachefilesd
>
>
> Tested-by: Zichen Tian <tianzichen@kuaishou.com>
> Tested-by: Jia Zhu <zhujia.zj@bytedance.com>
>
>
> RFC: https://lore.kernel.org/all/YbRL2glGzjfZkVbH@B-P7TQMD6M-0146.local/t/
> v1: https://lore.kernel.org/lkml/47831875-4bdd-8398-9f2d-0466b31a4382@linux.alibaba.com/T/
> v2: https://lore.kernel.org/all/2946d871-b9e1-cf29-6d39-bcab30f2854f@linux.alibaba.com/t/
> v3: https://lore.kernel.org/lkml/20220209060108.43051-1-jefflexu@linux.alibaba.com/T/
> v4: https://lore.kernel.org/lkml/20220307123305.79520-1-jefflexu@linux.alibaba.com/T/#t
> v5: https://lore.kernel.org/lkml/202203170912.gk2sqkaK-lkp@intel.com/T/
> v6: https://lore.kernel.org/lkml/202203260720.uA5o7k5w-lkp@intel.com/T/
> v7: https://lore.kernel.org/lkml/557bcf75-2334-5fbb-d2e0-c65e96da566d@linux.alibaba.com/T/
> v8: https://lore.kernel.org/all/ac8571b8-0935-1f4f-e9f1-e424f059b5ed@linux.alibaba.com/T/
> v9: https://lore.kernel.org/lkml/2067a5c7-4e24-f449-4676-811d12e9ab72@linux.alibaba.com/T/
> v10:https://lore.kernel.org/all/20220425122143.56815-21-jefflexu@linux.alibaba.com/t/
>
>
> [Background]
> ============
> Nydus [1] is an image distribution service especially optimized for
> distribution over network. Nydus is an excellent container image
> acceleration solution, since it only pulls data from remote when needed,
> a.k.a. on-demand reading and it also supports chunk-based deduplication,
> compression, etc.
>
> erofs (Enhanced Read-Only File System) is a filesystem designed for
> read-only scenarios. (Documentation/filesystem/erofs.rst)
>
> Over the past months we've been focusing on supporting Nydus image service
> with in-kernel erofs format[2]. In that case, each container image will be
> organized in one bootstrap (metadata) and (optional) multiple data blobs in
> erofs format. Massive container images will be stored on one machine.
>
> To accelerate the container startup (fetching container images from remote
> and then start the container), we do hope that the bootstrap & blob files
> could support on-demand read. That is, erofs can be mounted and accessed
> even when the bootstrap/data blob files have not been fully downloaded.
> Then it'll have native performance after data is available locally.
>
> That means we have to manage the cache state of the bootstrap/data blob
> files (if cache hit, read directly from the local cache; if cache miss,
> fetch the data somehow). It would be painful and may be dumb for erofs to
> implement the cache management itself. Thus we prefer fscache/cachefiles
> to do the cache management instead.
>
> The fscache on-demand read feature aims to be implemented in a generic way
> so that it can benefit other use cases and/or filesystems if it's
> implemented in the fscache subsystem.
>
> [1] https://nydus.dev
> [2] https://sched.co/pcdL
>
>
> [Overall Design]
> ================
> Please refer to patch 7 ("cachefiles: document on-demand read mode") for
> more details.
>
> When working in the original mode, cachefiles mainly serves as a local cache
> for remote networking fs, while in on-demand read mode, cachefiles can work
> in the scenario where on-demand read semantics is needed, e.g. container image
> distribution.
>
> The essential difference between these two modes is that, in original mode,
> when cache miss, netfs itself will fetch data from remote, and then write the
> fetched data into cache file. While in on-demand read mode, a user daemon is
> responsible for fetching data and then feeds to the kernel fscache side.
>
> The on-demand read mode relies on a simple protocol used for communication
> between kernel and user daemon.
>
> The proposed implementation relies on the anonymous fd mechanism to avoid
> the dependence on the format of cache file. When a fscache cachefile is opened
> for the first time, an anon_fd associated with the cache file is sent to the
> user daemon. With the given anon_fd, user daemon could fetch and write data
> into the cache file in the background, even when kernel has not triggered the
> cache miss. Besides, the write() syscall to the anon_fd will finally call
> cachefiles kernel module, which will write data to cache file in the latest
> format of cache file.
>
> 1. cache miss
> When cache miss, cachefiles kernel module will notify user daemon with the
> anon_fd, along with the requested file range. When notified, user daemon
> needs to fetch data of the requested file range, and then write the fetched
> data into cache file with the given anonymous fd. When finished processing
> the request, user daemon needs to notify the kernel.
>
> After notifying the user daemon, the kernel read routine will wait there,
> until the request is handled by user daemon. When it's awaken by the
> notification from user daemon, i.e. the corresponding hole has been filled
> by the user daemon, it will retry to read from the same file range.
>
> 2. cache hit
> Once data is already ready in cache file, netfs will read from cache
> file directly.
>
>
> [Advantage of fscache-based on-demand read]
> ========================================
> 1. Asynchronous prefetch
> In current mechanism, fscache is responsible for cache state management,
> while the data plane (fetching data from local/remote on cache miss) is
> done on the user daemon side even without any file system request driven.
> In addition, if cached data has already been available locally, fscache
> will use it instead of trapping to user space anymore.
>
> Therefore, different from event-driven approaches, the fscache on-demand
> user daemon could also fetch data (from remote) asynchronously in the
> background just like most multi-threaded HTTP downloaders.
>
> 2. Flexible request amplification
> Since the data plane can be independently controlled by the user daemon,
> the user daemon can also fetch more data from remote than that the file
> system actually requests for small I/O sizes. Then, fetched data in bulk
> will be available at once and fscache won't be trapped into the user
> daemon again.
>
> 3. Support massive blobs
> This mechanism can naturally support a large amount of backing files,
> and thus can benefit the densely employed scenarios. In our use cases,
> one container image can be formed of one bootstrap (required) and
> multiple chunk-deduplicated data blobs (optional).
>
> For example, one container image for node.js will correspond to ~20
> files in total. In densely employed environment, there could be hundreds
> of containers and thus thousands of backing files on one machine.
>
>
> [Following Steps]
> =================
> The following improvements are on our TODO list, and will be formed in
> shape with the development process:
>
> - Data blobs can be shared between multiple filesystems. Whilst in the
> current implementation, each filesystem registers a unique fscache_volume,
> causing the backing file for the data blob can not be shared between
> different erofs filesystems. Later we need to introduce shared domain
> in order to share fscache_volume, so that data blobs can be shared
> between container images to some degree.
>
> - in-memory extent-based data sharing, e.g., different files can share
> the same chunk of the data blob. In the current implementation, each erofs
> file maintains its own page cache, thus the page caches for the same chunk
> may be duplicated among multiple files sharing the same chunk.
>
> - other useful features, including multiple cachefiles daemon support,
> etc.
>
>
> Jeffle Xu (21):
> cachefiles: extract write routine
> cachefiles: notify the user daemon when looking up cookie
> cachefiles: unbind cachefiles gracefully in on-demand mode
> cachefiles: notify the user daemon when withdrawing cookie
> cachefiles: implement on-demand read
> cachefiles: enable on-demand read mode
> cachefiles: add tracepoints for on-demand read mode
> cachefiles: document on-demand read mode
> erofs: make erofs_map_blocks() generally available
> erofs: add fscache mode check helper
> erofs: register fscache volume
> erofs: add fscache context helper functions
> erofs: add anonymous inode caching metadata for data blobs
> erofs: add erofs_fscache_read_folios() helper
> erofs: register fscache context for primary data blob
> erofs: register fscache context for extra data blobs
> erofs: implement fscache-based metadata read
> erofs: implement fscache-based data read for non-inline layout
> erofs: implement fscache-based data read for inline layout
> erofs: implement fscache-based data readahead
> erofs: add 'fsid' mount option
For erofs parts:
Acked-by: Chao Yu <chao@kernel.org>
Thanks,
>
> Xin Yin (1):
> erofs: change to use asynchronous io for fscache readpage/readahead
>
> .../filesystems/caching/cachefiles.rst | 178 ++++++
> fs/cachefiles/Kconfig | 12 +
> fs/cachefiles/Makefile | 1 +
> fs/cachefiles/daemon.c | 117 +++-
> fs/cachefiles/interface.c | 2 +
> fs/cachefiles/internal.h | 78 +++
> fs/cachefiles/io.c | 76 ++-
> fs/cachefiles/namei.c | 16 +-
> fs/cachefiles/ondemand.c | 503 +++++++++++++++++
> fs/erofs/Kconfig | 10 +
> fs/erofs/Makefile | 1 +
> fs/erofs/data.c | 26 +-
> fs/erofs/fscache.c | 522 ++++++++++++++++++
> fs/erofs/inode.c | 4 +
> fs/erofs/internal.h | 49 ++
> fs/erofs/super.c | 105 +++-
> fs/erofs/sysfs.c | 4 +-
> include/linux/fscache.h | 1 +
> include/linux/netfs.h | 1 +
> include/trace/events/cachefiles.h | 176 ++++++
> include/uapi/linux/cachefiles.h | 68 +++
> 21 files changed, 1871 insertions(+), 79 deletions(-)
> create mode 100644 fs/cachefiles/ondemand.c
> create mode 100644 fs/erofs/fscache.c
> create mode 100644 include/uapi/linux/cachefiles.h
>
prev parent reply other threads:[~2022-05-10 14:50 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-09 7:40 [PATCH v11 00/22] fscache,erofs: fscache-based on-demand read semantics Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 01/22] cachefiles: extract write routine Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 02/22] cachefiles: notify the user daemon when looking up cookie Jeffle Xu
2022-05-10 12:50 ` David Howells
2022-05-09 7:40 ` [PATCH v11 03/22] cachefiles: unbind cachefiles gracefully in on-demand mode Jeffle Xu
2022-05-10 12:53 ` David Howells
2022-05-09 7:40 ` [PATCH v11 04/22] cachefiles: notify the user daemon when withdrawing cookie Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 05/22] cachefiles: implement on-demand read Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 06/22] cachefiles: enable on-demand read mode Jeffle Xu
2022-05-10 12:56 ` David Howells
2022-05-10 13:29 ` Gao Xiang
2022-05-09 7:40 ` [PATCH v11 07/22] cachefiles: add tracepoints for " Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 08/22] cachefiles: document " Jeffle Xu
2022-05-10 13:01 ` David Howells
2022-05-09 7:40 ` [PATCH v11 09/22] erofs: make erofs_map_blocks() generally available Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 10/22] erofs: add fscache mode check helper Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 11/22] erofs: register fscache volume Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 12/22] erofs: add fscache context helper functions Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 13/22] erofs: add anonymous inode caching metadata for data blobs Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 14/22] erofs: add erofs_fscache_read_folios() helper Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 15/22] erofs: register fscache context for primary data blob Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 16/22] erofs: register fscache context for extra data blobs Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 17/22] erofs: implement fscache-based metadata read Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 18/22] erofs: implement fscache-based data read for non-inline layout Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 19/22] erofs: implement fscache-based data read for inline layout Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 20/22] erofs: implement fscache-based data readahead Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 21/22] erofs: add 'fsid' mount option Jeffle Xu
2022-05-09 7:40 ` [PATCH v11 22/22] erofs: change to use asynchronous io for fscache readpage/readahead Jeffle Xu
2022-05-10 6:48 ` [PATCH v11 00/22] fscache, erofs: fscache-based on-demand read semantics 严松
2022-05-10 14:14 ` Chao Yu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c788c74-67c1-c0c4-83a0-7ec7a4b95fba@kernel.org \
--to=chao@kernel.org \
--cc=bo.liu@linux.alibaba.com \
--cc=dhowells@redhat.com \
--cc=eguan@linux.alibaba.com \
--cc=gerry@linux.alibaba.com \
--cc=gregkh@linuxfoundation.org \
--cc=jefflexu@linux.alibaba.com \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-cachefs@redhat.com \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luodaowen.backend@bytedance.com \
--cc=tao.peng@linux.alibaba.com \
--cc=tianzichen@kuaishou.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
--cc=yinxin.x@bytedance.com \
--cc=zhangjiachen.jaycee@bytedance.com \
--cc=zhujia.zj@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox