linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <xiang@kernel.org>
To: lsf-pc@lists.linuxfoundation.org
Cc: linux-fsdevel@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	kernel-team@android.com, linux-kernel@vger.kernel.org
Subject: [LSF/MM/BPF TOPIC] Image-based read-only filesystem: further use cases & directions
Date: Mon, 9 Jan 2023 16:43:17 +0800	[thread overview]
Message-ID: <Y7vTpeNRaw3Nlm9B@debian> (raw)

Hi folks,

* Background *

We've been continuously working on forming a useful read-only
(immutable) image solution since the end of 2017 (as a part of our
work) until now as everyone may know:  EROFS.

Currently it has already successfully landed to (about) billions of
Android-related devices, other types of embedded devices and containers
with many vendors involved, and we've always been seeking more use
cases such as incremental immutable rootfs, app sandboxes or packages
(Android apk? with many duplicated libraries), dataset packages, etc.

The reasons why we always do believe immutable images can benefit
various use cases are:

  - much easier for all vendors to ship/distribute/keep original signing
    (golden) images to each instance;

  - (combined with the writable layer such as overlayfs) easy to roll
    back to the original shipped state or do incremental updates;

  - easy to check data corruption or do data recovery (no matter
    whether physical device or network errors);

  - easy for real storage devices to do hardware write-protection for
    immutable images;

  - can do various offline algorithms (such as reduced metadata,
    content-defined rolling hash deduplication, compression) to minimize
    image sizes;

  - initrd with FSDAX to avoid double caching with advantages above;

  - and more.

In 2019, a LSF/MM/BPF topic was put forward to show EROFS initial use
cases [1] as the read-only Android rootfs of a single instance on
resource-limited devices so that effective compression became quite
important at that time.


* Problem *

In addition to enhance data compression for single-instance deployment,
as a self-contained approach (so that all use cases can share the only
_one_ signed image), we've also focusing on multiple instances (such as
containers or apps, each image represents a complete filesystem tree)
all together on one device with similar data recently years so that
effective data deduplication, on-demand lazy pulling, page cache
sharing among such different golden images became vital as well.


* Current progresses *

In order to resolve the challenges above, we've worked out:

  - (v5.15) chunk-based inodes (to form inode extents) to do data
    deduplication among a single image;

  - (v5.16) multiple shared blobs (to keep content-defined data) in
    addition to the primary blob (to keep filesystem metadata) for wider
    deduplication across different images:

  - (v5.19) file-based distribution by introducing in-kernel local
    caching fscache and on-demand lazy pulling feature [2];

  - (v6.1) shared domain to share such multiple shared blobs in
    fscache mode [3];

  - [RFC] preliminary page cache sharing between diffenent images [4].


* Potential topics to discuss *

  - data verification of different images with thousands (or more)
    shared blobs [5];

  - encryption with per-extent keys for confidential containers [5][6];

  - current page cache sharing limitation due to mm reserve mapping and
    finer (folio or page-based) page cache sharing among images/blobs
    [4][7];

  - more effective in-kernel local caching features for fscache such as
    failover and daemonless;

  - (wild preliminary ideas, maybe) overlayfs partial copy-up with
    fscache as the upper layer in order to form a unique caching
    subsystem for better space saving?

  - FSDAX enhancements for initial ramdisk or other use cases;

  - other issues when landing.


Finally, if our efforts (or plans) also make sense to you, we do hope
more people could join us, Thanks!

[1] https://lore.kernel.org/r/f44b1696-2f73-3637-9964-d73e3d5832b7@huawei.com
[2] https://lore.kernel.org/r/Yoj1AcHoBPqir++H@debian
[3] https://lore.kernel.org/r/20220918043456.147-1-zhujia.zj@bytedance.com
[4] https://lore.kernel.org/r/20230106125330.55529-1-jefflexu@linux.alibaba.com
[5] https://lore.kernel.org/r/Y6KqpGscDV6u5AfQ@B-P7TQMD6M-0146.local
[6] https://lwn.net/SubscriberLink/918893/4d389217f9b8d679
[7] https://lwn.net/Articles/895907

Thanks,
Gao Xiang

             reply	other threads:[~2023-01-09  8:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09  8:43 Gao Xiang [this message]
2023-02-23 10:38 ` [LSF/MM/BPF TOPIC] Image-based read-only filesystem: further use cases & directions Xin Yin
2023-02-24  3:10 ` Zhang Yi
2023-02-28  6:12 ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7vTpeNRaw3Nlm9B@debian \
    --to=xiang@kernel.org \
    --cc=kernel-team@android.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lsf-pc@lists.linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).