From: "Colin Walters" <walters@verbum.org>
To: "Gao Xiang" <hsiangkao@linux.alibaba.com>,
"Alexander Larsson" <alexl@redhat.com>,
lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org,
"Amir Goldstein" <amir73il@gmail.com>,
"Christian Brauner" <brauner@kernel.org>,
"Jingbo Xu" <jefflexu@linux.alibaba.com>,
"Giuseppe Scrivano" <gscrivan@redhat.com>,
"Dave Chinner" <david@fromorbit.com>,
"Vivek Goyal" <vgoyal@redhat.com>,
"Miklos Szeredi" <miklos@szeredi.hu>
Subject: Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay
Date: Mon, 06 Mar 2023 20:00:45 -0500 [thread overview]
Message-ID: <6bea16fa-737f-4aad-a2cd-0a12029e614d@app.fastmail.com> (raw)
In-Reply-To: <0a571702-a907-c2b1-bb38-96aa7b268a1b@linux.alibaba.com>
On Sat, Mar 4, 2023, at 10:29 AM, Gao Xiang wrote:
> Hi Colin,
>
> On 2023/3/4 22:59, Colin Walters wrote:
>>
>>
>> On Fri, Mar 3, 2023, at 12:37 PM, Gao Xiang wrote:
>>>
>>> Actually since you're container guys, I would like to mention
>>> a way to directly reuse OCI tar data and not sure if you
>>> have some interest as well, that is just to generate EROFS
>>> metadata which could point to the tar blobs so that data itself
>>> is still the original tar, but we could add fsverity + IMMUTABLE
>>> to these blobs rather than the individual untared files.
>>
>>> - OCI layer diff IDs in the OCI spec [1] are guaranteed;
>>
>> The https://github.com/vbatts/tar-split approach addresses this problem domain adequately I think.
>
> Thanks for the interest and comment.
>
> I'm not aware of this project, and I'm not sure if tar-split
> helps mount tar stuffs, maybe I'm missing something?
Not directly; it's widely used in the container ecosystem (podman/docker etc.) to split off the original bit-for-bit tar stream metadata content from the actually large data (particularly regular files) so that one can write the files to a regular underlying fs (xfs/ext4/etc.) and use overlayfs on top. Then it helps reverse the process and reconstruct the original tar stream for pushes, for exactly the reason you mention.
Slightly OT but a whole reason we're having this conversation now is definitely rooted in the original Docker inventor having the idea of *deriving* or layering on top of previous images, which is not part of dpkg/rpm or squashfs or raw disk images etc. Inherent in this is the idea that we're not talking about *a* filesystem - we're talking about filesystem*s* plural and how they're wired together and stacked.
It's really only very simplistic use cases for which a single read-only filesystem suffices. They exist - e.g. people booting things like Tails OS https://tails.boum.org/ on one of those USB sticks with a physical write protection switch, etc.
But that approach makes every OS update very expensive - most use cases really want fast and efficient incremental in-place OS updates and a clear distinct split between OS filesystem and app filesystems. But without also forcing separate size management onto both.
> Not bacause EROFS cannot do on-disk dedupe, just because in this
> way EROFS can only use the original tar blobs, and EROFS is not
> the guy to resolve the on-disk sharing stuff.
Right, agree; this ties into my larger point above that no one technology/filesystem is the sole solution in the general case.
> As a kernel filesystem, if two files are equal, we could treat them
> in the same inode address space, even they are actually with slightly
> different inode metadata (uid, gid, mode, nlink, etc). That is
> entirely possible as an in-kernel filesystem even currently linux
> kernel doesn't implement finer page cache sharing, so EROFS can
> support page-cache sharing of files in all tar streams if needed.
Hmmm. I should clarify here I have zero kernel patches, I'm a userspace developer (on container and OS updates, for which I'd like a unified stack). But it seems to me that while you're right that it would be technically possible for a single filesystem to do this, in practice it would require some sort of virtual sub-filesystem internally. And at that point, it does seem more elegant to me to make that stacking explicit, more like how composefs is doing it.
That said I think there's a lot of legitimate debate here, and I hope we can continue doing so productively!
next prev parent reply other threads:[~2023-03-07 1:01 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-27 9:22 [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay Alexander Larsson
2023-02-27 10:45 ` Gao Xiang
2023-02-27 10:58 ` Christian Brauner
2023-04-27 16:11 ` [Lsf-pc] " Amir Goldstein
2023-03-01 3:47 ` Jingbo Xu
2023-03-03 14:41 ` Alexander Larsson
2023-03-03 15:48 ` Gao Xiang
2023-02-27 11:37 ` Jingbo Xu
2023-03-03 13:57 ` Alexander Larsson
2023-03-03 15:13 ` Gao Xiang
2023-03-03 17:37 ` Gao Xiang
2023-03-04 14:59 ` Colin Walters
2023-03-04 15:29 ` Gao Xiang
2023-03-04 16:22 ` Gao Xiang
2023-03-07 1:00 ` Colin Walters [this message]
2023-03-07 3:10 ` Gao Xiang
2023-03-07 10:15 ` Christian Brauner
2023-03-07 11:03 ` Gao Xiang
2023-03-07 12:09 ` Alexander Larsson
2023-03-07 12:55 ` Gao Xiang
2023-03-07 15:16 ` Christian Brauner
2023-03-07 19:33 ` Giuseppe Scrivano
2023-03-08 10:31 ` Christian Brauner
2023-03-07 13:38 ` Jeff Layton
2023-03-08 10:37 ` Christian Brauner
2023-03-04 0:46 ` Jingbo Xu
2023-03-06 11:33 ` Alexander Larsson
2023-03-06 12:15 ` Gao Xiang
2023-03-06 15:49 ` Jingbo Xu
2023-03-06 16:09 ` Alexander Larsson
2023-03-06 16:17 ` Gao Xiang
2023-03-07 8:21 ` Alexander Larsson
2023-03-07 8:33 ` Gao Xiang
2023-03-07 8:48 ` Gao Xiang
2023-03-07 9:07 ` Alexander Larsson
2023-03-07 9:26 ` Gao Xiang
2023-03-07 9:38 ` Gao Xiang
2023-03-07 9:56 ` Alexander Larsson
2023-03-07 10:06 ` Gao Xiang
2023-03-07 9:46 ` Alexander Larsson
2023-03-07 10:01 ` Gao Xiang
2023-03-07 10:00 ` Jingbo Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6bea16fa-737f-4aad-a2cd-0a12029e614d@app.fastmail.com \
--to=walters@verbum.org \
--cc=alexl@redhat.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=gscrivan@redhat.com \
--cc=hsiangkao@linux.alibaba.com \
--cc=jefflexu@linux.alibaba.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=miklos@szeredi.hu \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).