From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Colin Walters <walters@verbum.org>,
Alexander Larsson <alexl@redhat.com>,
lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org,
Amir Goldstein <amir73il@gmail.com>,
Christian Brauner <brauner@kernel.org>,
Jingbo Xu <jefflexu@linux.alibaba.com>,
Giuseppe Scrivano <gscrivan@redhat.com>,
Dave Chinner <david@fromorbit.com>,
Vivek Goyal <vgoyal@redhat.com>,
Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay
Date: Sat, 4 Mar 2023 23:29:20 +0800 [thread overview]
Message-ID: <0a571702-a907-c2b1-bb38-96aa7b268a1b@linux.alibaba.com> (raw)
In-Reply-To: <4782a0db-5780-4309-badf-67f69507cc81@app.fastmail.com>
Hi Colin,
On 2023/3/4 22:59, Colin Walters wrote:
>
>
> On Fri, Mar 3, 2023, at 12:37 PM, Gao Xiang wrote:
>>
>> Actually since you're container guys, I would like to mention
>> a way to directly reuse OCI tar data and not sure if you
>> have some interest as well, that is just to generate EROFS
>> metadata which could point to the tar blobs so that data itself
>> is still the original tar, but we could add fsverity + IMMUTABLE
>> to these blobs rather than the individual untared files.
>
>> - OCI layer diff IDs in the OCI spec [1] are guaranteed;
>
> The https://github.com/vbatts/tar-split approach addresses this problem domain adequately I think.
Thanks for the interest and comment.
I'm not aware of this project, and I'm not sure if tar-split
helps mount tar stuffs, maybe I'm missing something?
As for EROFS, as long as we support subpage block size, it's
entirely possible to refer the original tar data without tar
stream modification.
>
> Correct me if I'm wrong, but having erofs point to underlying tar wouldn't by default get us page cache sharing or even the "opportunistic" disk sharing that composefs brings, unless userspace did something like attempting to dedup files in the tar stream via hashing and using reflinks on the underlying fs. And then doing reflinks would require alignment inside the stream, right? The https://fedoraproject.org/wiki/Changes/RPMCoW change is very similar in that it's proposing a modification of the RPM format to 4k align files in the
hmmm.. I think userspace don't need to dedupe files in the
tar stream.
stream for this reason. But that's exactly it, then it's a new tweaked format and not identical to what came before, so the "compatibility" rationale is actually weakened a lot.
>
>
As you said, "opportunistic" finer disk sharing inside all tar
streams can be resolved by reflink or other stuffs by the underlay
filesystems (like XFS, or virtual devices like device mapper).
Not bacause EROFS cannot do on-disk dedupe, just because in this
way EROFS can only use the original tar blobs, and EROFS is not
the guy to resolve the on-disk sharing stuff. However, here since
the original tar blob is used, so that the tar stream data is
unchanged (with the same diffID) when the container is running.
As a kernel filesystem, if two files are equal, we could treat them
in the same inode address space, even they are actually with slightly
different inode metadata (uid, gid, mode, nlink, etc). That is
entirely possible as an in-kernel filesystem even currently linux
kernel doesn't implement finer page cache sharing, so EROFS can
support page-cache sharing of files in all tar streams if needed.
Thanks,
Gao Xiang
next prev parent reply other threads:[~2023-03-04 15:29 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-27 9:22 [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay Alexander Larsson
2023-02-27 10:45 ` Gao Xiang
2023-02-27 10:58 ` Christian Brauner
2023-04-27 16:11 ` [Lsf-pc] " Amir Goldstein
2023-03-01 3:47 ` Jingbo Xu
2023-03-03 14:41 ` Alexander Larsson
2023-03-03 15:48 ` Gao Xiang
2023-02-27 11:37 ` Jingbo Xu
2023-03-03 13:57 ` Alexander Larsson
2023-03-03 15:13 ` Gao Xiang
2023-03-03 17:37 ` Gao Xiang
2023-03-04 14:59 ` Colin Walters
2023-03-04 15:29 ` Gao Xiang [this message]
2023-03-04 16:22 ` Gao Xiang
2023-03-07 1:00 ` Colin Walters
2023-03-07 3:10 ` Gao Xiang
2023-03-07 10:15 ` Christian Brauner
2023-03-07 11:03 ` Gao Xiang
2023-03-07 12:09 ` Alexander Larsson
2023-03-07 12:55 ` Gao Xiang
2023-03-07 15:16 ` Christian Brauner
2023-03-07 19:33 ` Giuseppe Scrivano
2023-03-08 10:31 ` Christian Brauner
2023-03-07 13:38 ` Jeff Layton
2023-03-08 10:37 ` Christian Brauner
2023-03-04 0:46 ` Jingbo Xu
2023-03-06 11:33 ` Alexander Larsson
2023-03-06 12:15 ` Gao Xiang
2023-03-06 15:49 ` Jingbo Xu
2023-03-06 16:09 ` Alexander Larsson
2023-03-06 16:17 ` Gao Xiang
2023-03-07 8:21 ` Alexander Larsson
2023-03-07 8:33 ` Gao Xiang
2023-03-07 8:48 ` Gao Xiang
2023-03-07 9:07 ` Alexander Larsson
2023-03-07 9:26 ` Gao Xiang
2023-03-07 9:38 ` Gao Xiang
2023-03-07 9:56 ` Alexander Larsson
2023-03-07 10:06 ` Gao Xiang
2023-03-07 9:46 ` Alexander Larsson
2023-03-07 10:01 ` Gao Xiang
2023-03-07 10:00 ` Jingbo Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0a571702-a907-c2b1-bb38-96aa7b268a1b@linux.alibaba.com \
--to=hsiangkao@linux.alibaba.com \
--cc=alexl@redhat.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=gscrivan@redhat.com \
--cc=jefflexu@linux.alibaba.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=miklos@szeredi.hu \
--cc=vgoyal@redhat.com \
--cc=walters@verbum.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).