linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Larsson <alexl@redhat.com>
To: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	Amir Goldstein <amir73il@gmail.com>,
	Christian Brauner <brauner@kernel.org>,
	Gao Xiang <hsiangkao@linux.alibaba.com>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay
Date: Mon, 6 Mar 2023 17:09:01 +0100	[thread overview]
Message-ID: <CAL7ro1GwDF1201StXw8xL9xL6y4jW1t+cbLPOmsRUp574+ewQQ@mail.gmail.com> (raw)
In-Reply-To: <e81d3776-8239-b8fa-1c64-bdb6f5cbe4df@linux.alibaba.com>

On Mon, Mar 6, 2023 at 4:49 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
> On 3/6/23 7:33 PM, Alexander Larsson wrote:
> > On Fri, Mar 3, 2023 at 2:57 PM Alexander Larsson <alexl@redhat.com> wrote:
> >>
> >> On Mon, Feb 27, 2023 at 10:22 AM Alexander Larsson <alexl@redhat.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the
> >>> Composefs filesystem. It is an opportunistically sharing, validating
> >>> image-based filesystem, targeting usecases like validated ostree
> >>> rootfs:es, validated container images that share common files, as well
> >>> as other image based usecases.
> >>>
> >>> During the discussions in the composefs proposal (as seen on LWN[3])
> >>> is has been proposed that (with some changes to overlayfs), similar
> >>> behaviour can be achieved by combining the overlayfs
> >>> "overlay.redirect" xattr with an read-only filesystem such as erofs.
> >>>
> >>> There are pros and cons to both these approaches, and the discussion
> >>> about their respective value has sometimes been heated. We would like
> >>> to have an in-person discussion at the summit, ideally also involving
> >>> more of the filesystem development community, so that we can reach
> >>> some consensus on what is the best apporach.
> >>
> >> In order to better understand the behaviour and requirements of the
> >> overlayfs+erofs approach I spent some time implementing direct support
> >> for erofs in libcomposefs. So, with current HEAD of
> >> github.com/containers/composefs you can now do:
> >>
> >> $ mkcompose --digest-store=objects --format=erofs source-dir image.erofs
> >>
> >> This will produce an object store with the backing files, and a erofs
> >> file with the required overlayfs xattrs, including a made up one
> >> called "overlay.fs-verity" containing the expected fs-verity digest
> >> for the lower dir. It also adds the required whiteouts to cover the
> >> 00-ff dirs from the lower dir.
> >>
> >> These erofs files are ordered similarly to the composefs files, and we
> >> give similar guarantees about their reproducibility, etc. So, they
> >> should be apples-to-apples comparable with the composefs images.
> >>
> >> Given this, I ran another set of performance tests on the original cs9
> >> rootfs dataset, again measuring the time of `ls -lR`. I also tried to
> >> measure the memory use like this:
> >>
> >> # echo 3 > /proc/sys/vm/drop_caches
> >> # systemd-run --scope sh -c 'ls -lR mountpoint' > /dev/null; cat $(cat
> >> /proc/self/cgroup | sed -e "s|0::|/sys/fs/cgroup|")/memory.peak'
> >>
> >> These are the alternatives I tried:
> >>
> >> xfs: the source of the image, regular dir on xfs
> >> erofs: the image.erofs above, on loopback
> >> erofs dio: the image.erofs above, on loopback with --direct-io=on
> >> ovl: erofs above combined with overlayfs
> >> ovl dio: erofs dio above combined with overlayfs
> >> cfs: composefs mount of image.cfs
> >>
> >> All tests use the same objects dir, stored on xfs. The erofs and
> >> overlay implementations are from a stock 6.1.13 kernel, and composefs
> >> module is from github HEAD.
> >>
> >> I tried loopback both with and without the direct-io option, because
> >> without direct-io enabled the kernel will double-cache the loopbacked
> >> data, as per[1].
> >>
> >> The produced images are:
> >>  8.9M image.cfs
> >> 11.3M image.erofs
> >>
> >> And gives these results:
> >>            | Cold cache | Warm cache | Mem use
> >>            |   (msec)   |   (msec)   |  (mb)
> >> -----------+------------+------------+---------
> >> xfs        |   1449     |    442     |    54
> >> erofs      |    700     |    391     |    45
> >> erofs dio  |    939     |    400     |    45
> >> ovl        |   1827     |    530     |   130
> >> ovl dio    |   2156     |    531     |   130
> >> cfs        |    689     |    389     |    51
> >
> > It has been noted that the readahead done by kernel_read() may cause
> > read-ahead of unrelated data into memory which skews the results in
> > favour of workloads that consume all the filesystem metadata (such as
> > the ls -lR usecase of the above test). In the table above this favours
> > composefs (which uses kernel_read in some codepaths) as well as
> > non-dio erofs (non-dio loopback device uses readahead too).
> >
> > I updated composefs to not use kernel_read here:
> >   https://github.com/containers/composefs/pull/105
> >
> > And a new kernel patch-set based on this is available at:
> >   https://github.com/alexlarsson/linux/tree/composefs
> >
> > The resulting table is now (dropping the non-dio erofs):
> >
> >            | Cold cache | Warm cache | Mem use
> >            |   (msec)   |   (msec)   |  (mb)
> > -----------+------------+------------+---------
> > xfs        |   1449     |    442     |   54
> > erofs dio  |    939     |    400     |   45
> > ovl dio    |   2156     |    531     |  130
> > cfs        |    833     |    398     |   51
> >
> >            | Cold cache | Warm cache | Mem use
> >            |   (msec)   |   (msec)   |  (mb)
> > -----------+------------+------------+---------
> > ext4       |   1135     |    394     |   54
> > erofs dio  |    922     |    401     |   45
> > ovl dio    |   1810     |    532     |  149
> > ovl lazy   |   1063     |    523     |  87
> > cfs        |    768     |    459     |  51
> >
> > So, while cfs is somewhat worse now for this particular usecase, my
> > overall analysis still stands.
> >
>
> Hi,
>
> I tested your patch removing kernel_read(), and here is the statistics
> tested in my environment.
>
>
> Setup
> ======
> CPU: x86_64 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> Disk: cloud disk, 11800 IOPS upper limit
> OS: Linux v6.2
> FS of backing objects: xfs
>
>
> Image size
> ===========
> 8.6M large.composefs (with --compute-digest)
> 8.9M large.erofs (mkfs.erofs)
> 11M  large.cps.in.erofs (mkfs.composefs --compute-digest --format=erofs)
>
>
> Perf of "ls -lR"
> ================
>                                               | uncached| cached
>                                               |  (ms)   |  (ms)
> ----------------------------------------------|---------|--------
> composefs                                          | 519        | 178
> erofs (mkfs.erofs, DIRECT loop)                    | 497        | 192
> erofs (mkfs.composefs --format=erofs, DIRECT loop) | 536        | 199
>
> I tested the performance of "ls -lR" on the whole tree of
> cs9-developer-rootfs.  It seems that the performance of erofs (generated
> from mkfs.erofs) is slightly better than that of composefs.  While the
> performance of erofs generated from mkfs.composefs is slightly worse
> that that of composefs.

I suspect that the reason for the lower performance of mkfs.composefs
is the added overlay.fs-verity xattr to all the files. It makes the
image larger, and that means more i/o.

> The uncached performance is somewhat slightly different with that given
> by Alexander Larsson.  I think it may be due to different test
> environment, as my test machine is a server with robust performance,
> with cloud disk as storage.
>
> It's just a simple test without further analysis, as it's a bit late for
> me :)

Yeah, and for the record, I'm not claiming that my tests contain any
high degree of analysis or rigour either. They are short simple test
runs that give a rough estimate of the overall performance of metadata
operations. What is interesting here is if there are large or
unexpected differences, and from that point of view our results are
basically the same.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl@redhat.com         alexander.larsson@gmail.com


  reply	other threads:[~2023-03-06 16:26 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27  9:22 [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay Alexander Larsson
2023-02-27 10:45 ` Gao Xiang
2023-02-27 10:58   ` Christian Brauner
2023-04-27 16:11     ` [Lsf-pc] " Amir Goldstein
2023-03-01  3:47   ` Jingbo Xu
2023-03-03 14:41     ` Alexander Larsson
2023-03-03 15:48       ` Gao Xiang
2023-02-27 11:37 ` Jingbo Xu
2023-03-03 13:57 ` Alexander Larsson
2023-03-03 15:13   ` Gao Xiang
2023-03-03 17:37     ` Gao Xiang
2023-03-04 14:59       ` Colin Walters
2023-03-04 15:29         ` Gao Xiang
2023-03-04 16:22           ` Gao Xiang
2023-03-07  1:00           ` Colin Walters
2023-03-07  3:10             ` Gao Xiang
2023-03-07 10:15     ` Christian Brauner
2023-03-07 11:03       ` Gao Xiang
2023-03-07 12:09       ` Alexander Larsson
2023-03-07 12:55         ` Gao Xiang
2023-03-07 15:16         ` Christian Brauner
2023-03-07 19:33           ` Giuseppe Scrivano
2023-03-08 10:31             ` Christian Brauner
2023-03-07 13:38       ` Jeff Layton
2023-03-08 10:37         ` Christian Brauner
2023-03-04  0:46   ` Jingbo Xu
2023-03-06 11:33   ` Alexander Larsson
2023-03-06 12:15     ` Gao Xiang
2023-03-06 15:49     ` Jingbo Xu
2023-03-06 16:09       ` Alexander Larsson [this message]
2023-03-06 16:17         ` Gao Xiang
2023-03-07  8:21           ` Alexander Larsson
2023-03-07  8:33             ` Gao Xiang
2023-03-07  8:48               ` Gao Xiang
2023-03-07  9:07               ` Alexander Larsson
2023-03-07  9:26                 ` Gao Xiang
2023-03-07  9:38                   ` Gao Xiang
2023-03-07  9:56                     ` Alexander Larsson
2023-03-07 10:06                       ` Gao Xiang
2023-03-07  9:46                   ` Alexander Larsson
2023-03-07 10:01                     ` Gao Xiang
2023-03-07 10:00       ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL7ro1GwDF1201StXw8xL9xL6y4jW1t+cbLPOmsRUp574+ewQQ@mail.gmail.com \
    --to=alexl@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=gscrivan@redhat.com \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=miklos@szeredi.hu \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).