linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xin Yin <yinxin.x@bytedance.com>
To: xiang@kernel.org
Cc: kernel-team@android.com, linux-erofs@lists.ozlabs.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	lsf-pc@lists.linuxfoundation.org, dhowells@redhat.com,
	jefflexu@linux.alibaba.com, Xin Yin <yinxin.x@bytedance.com>
Subject: Re: [LSF/MM/BPF TOPIC] Image-based read-only filesystem: further use cases & directions
Date: Thu, 23 Feb 2023 18:38:16 +0800	[thread overview]
Message-ID: <20230223103816.2623-1-yinxin.x@bytedance.com> (raw)
In-Reply-To: <Y7vTpeNRaw3Nlm9B@debian>

On 2023/1/9 16:43, Gao Xiang wrote:
> Hi folks,
> 
> * Background *
> 
> We've been continuously working on forming a useful read-only
> (immutable) image solution since the end of 2017 (as a part of our
> work) until now as everyone may know:  EROFS.
> 
> Currently it has already successfully landed to (about) billions of
> Android-related devices, other types of embedded devices and containers
> with many vendors involved, and we've always been seeking more use
> cases such as incremental immutable rootfs, app sandboxes or packages
> (Android apk? with many duplicated libraries), dataset packages, etc.
> 
> The reasons why we always do believe immutable images can benefit
> various use cases are:
> 
>   - much easier for all vendors to ship/distribute/keep original signing
>     (golden) images to each instance;
> 
>   - (combined with the writable layer such as overlayfs) easy to roll
>     back to the original shipped state or do incremental updates;
> 
>   - easy to check data corruption or do data recovery (no matter
>     whether physical device or network errors);
> 
>   - easy for real storage devices to do hardware write-protection for
>     immutable images;
> 
>   - can do various offline algorithms (such as reduced metadata,
>     content-defined rolling hash deduplication, compression) to minimize
>     image sizes;
> 
>   - initrd with FSDAX to avoid double caching with advantages above;
> 
>   - and more.
> 
> In 2019, a LSF/MM/BPF topic was put forward to show EROFS initial use
> cases [1] as the read-only Android rootfs of a single instance on
> resource-limited devices so that effective compression became quite
> important at that time.
> 
> 
> * Problem *
> 
> In addition to enhance data compression for single-instance deployment,
> as a self-contained approach (so that all use cases can share the only
> _one_ signed image), we've also focusing on multiple instances (such as
> containers or apps, each image represents a complete filesystem tree)
> all together on one device with similar data recently years so that
> effective data deduplication, on-demand lazy pulling, page cache
> sharing among such different golden images became vital as well.
> 
> 
> * Current progresses *
> 
> In order to resolve the challenges above, we've worked out:
> 
>   - (v5.15) chunk-based inodes (to form inode extents) to do data
>     deduplication among a single image;
> 
>   - (v5.16) multiple shared blobs (to keep content-defined data) in
>     addition to the primary blob (to keep filesystem metadata) for wider
>     deduplication across different images:
> 
>   - (v5.19) file-based distribution by introducing in-kernel local
>     caching fscache and on-demand lazy pulling feature [2];
> 
>   - (v6.1) shared domain to share such multiple shared blobs in
>     fscache mode [3];
> 
>   - [RFC] preliminary page cache sharing between diffenent images [4].
> 
> 
> * Potential topics to discuss *
> 
>   - data verification of different images with thousands (or more)
>     shared blobs [5];
> 
>   - encryption with per-extent keys for confidential containers [5][6];
> 
>   - current page cache sharing limitation due to mm reserve mapping and
>     finer (folio or page-based) page cache sharing among images/blobs
>     [4][7];
> 
>   - more effective in-kernel local caching features for fscache such as
>     failover and daemonless;
> 
>   - (wild preliminary ideas, maybe) overlayfs partial copy-up with
>     fscache as the upper layer in order to form a unique caching
>     subsystem for better space saving?
>

We also interested in these topic, page cache sharing is an exciting feature, and may can save 
a lot of memory in high-density deployment scenarios, cause we already can share blobs.

Hope to have further discussion on the failover, mutiple daemons/dirs and daemonless feature of fscache & cachefiles.
So we can have a better form for our production.

And Looking forward to the opportunity to discuss online, if I can't attend offline.

Thanks,
Xin Yin

>   - FSDAX enhancements for initial ramdisk or other use cases;
> 
>   - other issues when landing.
> 
> 
> Finally, if our efforts (or plans) also make sense to you, we do hope
> more people could join us, Thanks!
> 
> [1] https://lore.kernel.org/r/f44b1696-2f73-3637-9964-d73e3d5832b7@huawei.com
> [2] https://lore.kernel.org/r/Yoj1AcHoBPqir++H@debian
> [3] https://lore.kernel.org/r/20220918043456.147-1-zhujia.zj@bytedance.com
> [4] https://lore.kernel.org/r/20230106125330.55529-1-jefflexu@linux.alibaba.com
> [5] https://lore.kernel.org/r/Y6KqpGscDV6u5AfQ@B-P7TQMD6M-0146.local
> [6] https://lwn.net/SubscriberLink/918893/4d389217f9b8d679
> [7] https://lwn.net/Articles/895907
> 
> Thanks,
> Gao Xiang

-- 
2.25.1


  reply	other threads:[~2023-02-23 10:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09  8:43 [LSF/MM/BPF TOPIC] Image-based read-only filesystem: further use cases & directions Gao Xiang
2023-02-23 10:38 ` Xin Yin [this message]
2023-02-24  3:10 ` Zhang Yi
2023-02-28  6:12 ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230223103816.2623-1-yinxin.x@bytedance.com \
    --to=yinxin.x@bytedance.com \
    --cc=dhowells@redhat.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=kernel-team@android.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lsf-pc@lists.linuxfoundation.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).