linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Larsson <alexl@redhat.com>
To: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>,
	Christian Brauner <brauner@kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org, lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay
Date: Fri, 3 Mar 2023 15:41:33 +0100	[thread overview]
Message-ID: <CAL7ro1E_g9M1S6Eg45B63Sdfif4qrj7rdYSyWEW_OaOD833dUA@mail.gmail.com> (raw)
In-Reply-To: <83829005-3f12-afac-9d05-8ba721a80b4d@linux.alibaba.com>

On Wed, Mar 1, 2023 at 4:47 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
> Hi all,
>
> On 2/27/23 6:45 PM, Gao Xiang wrote:
> >
> > (+cc Jingbo Xu and Christian Brauner)
> >
> > On 2023/2/27 17:22, Alexander Larsson wrote:
> >> Hello,
> >>
> >> Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the
> >> Composefs filesystem. It is an opportunistically sharing, validating
> >> image-based filesystem, targeting usecases like validated ostree
> >> rootfs:es, validated container images that share common files, as well
> >> as other image based usecases.
> >>
> >> During the discussions in the composefs proposal (as seen on LWN[3])
> >> is has been proposed that (with some changes to overlayfs), similar
> >> behaviour can be achieved by combining the overlayfs
> >> "overlay.redirect" xattr with an read-only filesystem such as erofs.
> >>
> >> There are pros and cons to both these approaches, and the discussion
> >> about their respective value has sometimes been heated. We would like
> >> to have an in-person discussion at the summit, ideally also involving
> >> more of the filesystem development community, so that we can reach
> >> some consensus on what is the best apporach.
> >>
> >> Good participants would be at least: Alexander Larsson, Giuseppe
> >> Scrivano, Amir Goldstein, David Chinner, Gao Xiang, Miklos Szeredi,
> >> Jingbo Xu
> > I'd be happy to discuss this at LSF/MM/BPF this year. Also we've addressed
> > the root cause of the performance gap is that
> >
> > composefs read some data symlink-like payload data by using
> > cfs_read_vdata_path() which involves kernel_read() and trigger heuristic
> > readahead of dir data (which is also landed in composefs vdata area
> > together with payload), so that most composefs dir I/O is already done
> > in advance by heuristic  readahead.  And we think almost all exist
> > in-kernel local fses doesn't have such heuristic readahead and if we add
> > the similar stuff, EROFS could do better than composefs.
> >
> > Also we've tried random stat()s about 500~1000 files in the tree you shared
> > (rather than just "ls -lR") and EROFS did almost the same or better than
> > composefs.  I guess further analysis (including blktrace) could be shown by
> > Jingbo later.
> >
>
> The link path string and dirents are mix stored in a so-called vdata
> (variable data) section[1] in composefs, sometimes even in the same
> block (figured out by dumping the composefs image).  When doing lookup,
> composefs will resolve the link path.  It will read the link path string
> from vdata section through kernel_read(), along which those dirents in
> the following blocks are also read in by the heuristic readahead
> algorithm in kernel_read().  I believe this will much benefit the
> performance in the workload like "ls -lR".

This is interesting stuff, and honestly I'm a bit surprised other
filesystems don't try to readahead directory metadata to some degree
too. It seems inherent to all filesystems that they try to pack
related metadata near each other, so readahead would probably be
useful even for read-write filesystems, although even more so for
read-only filesystems (due to lack of fragmentation).

But anyway, this is sort of beside the current issue. There is nothing
inherent in composefs that makes it have to do readahead like this,
and correspondingly, if it is a good idea to do it, erofs could do it
too,

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl@redhat.com         alexander.larsson@gmail.com


  reply	other threads:[~2023-03-03 14:42 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27  9:22 [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay Alexander Larsson
2023-02-27 10:45 ` Gao Xiang
2023-02-27 10:58   ` Christian Brauner
2023-04-27 16:11     ` [Lsf-pc] " Amir Goldstein
2023-03-01  3:47   ` Jingbo Xu
2023-03-03 14:41     ` Alexander Larsson [this message]
2023-03-03 15:48       ` Gao Xiang
2023-02-27 11:37 ` Jingbo Xu
2023-03-03 13:57 ` Alexander Larsson
2023-03-03 15:13   ` Gao Xiang
2023-03-03 17:37     ` Gao Xiang
2023-03-04 14:59       ` Colin Walters
2023-03-04 15:29         ` Gao Xiang
2023-03-04 16:22           ` Gao Xiang
2023-03-07  1:00           ` Colin Walters
2023-03-07  3:10             ` Gao Xiang
2023-03-07 10:15     ` Christian Brauner
2023-03-07 11:03       ` Gao Xiang
2023-03-07 12:09       ` Alexander Larsson
2023-03-07 12:55         ` Gao Xiang
2023-03-07 15:16         ` Christian Brauner
2023-03-07 19:33           ` Giuseppe Scrivano
2023-03-08 10:31             ` Christian Brauner
2023-03-07 13:38       ` Jeff Layton
2023-03-08 10:37         ` Christian Brauner
2023-03-04  0:46   ` Jingbo Xu
2023-03-06 11:33   ` Alexander Larsson
2023-03-06 12:15     ` Gao Xiang
2023-03-06 15:49     ` Jingbo Xu
2023-03-06 16:09       ` Alexander Larsson
2023-03-06 16:17         ` Gao Xiang
2023-03-07  8:21           ` Alexander Larsson
2023-03-07  8:33             ` Gao Xiang
2023-03-07  8:48               ` Gao Xiang
2023-03-07  9:07               ` Alexander Larsson
2023-03-07  9:26                 ` Gao Xiang
2023-03-07  9:38                   ` Gao Xiang
2023-03-07  9:56                     ` Alexander Larsson
2023-03-07 10:06                       ` Gao Xiang
2023-03-07  9:46                   ` Alexander Larsson
2023-03-07 10:01                     ` Gao Xiang
2023-03-07 10:00       ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL7ro1E_g9M1S6Eg45B63Sdfif4qrj7rdYSyWEW_OaOD833dUA@mail.gmail.com \
    --to=alexl@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).