linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Alexander Larsson <alexl@redhat.com>, lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org,
	Amir Goldstein <amir73il@gmail.com>,
	Christian Brauner <brauner@kernel.org>,
	Jingbo Xu <jefflexu@linux.alibaba.com>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay
Date: Mon, 6 Mar 2023 20:15:27 +0800	[thread overview]
Message-ID: <e595f640-53e2-cd6a-f169-e1567c4ff73b@linux.alibaba.com> (raw)
In-Reply-To: <CAL7ro1FZMRiep582LaiaqqxzYq_XeM2UMxvsHoT-guf_-bqSfg@mail.gmail.com>



On 2023/3/6 19:33, Alexander Larsson wrote:
> On Fri, Mar 3, 2023 at 2:57 PM Alexander Larsson <alexl@redhat.com> wrote:
>>
>> On Mon, Feb 27, 2023 at 10:22 AM Alexander Larsson <alexl@redhat.com> wrote:
>>>
>>> Hello,
>>>
>>> Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the
>>> Composefs filesystem. It is an opportunistically sharing, validating
>>> image-based filesystem, targeting usecases like validated ostree
>>> rootfs:es, validated container images that share common files, as well
>>> as other image based usecases.
>>>
>>> During the discussions in the composefs proposal (as seen on LWN[3])
>>> is has been proposed that (with some changes to overlayfs), similar
>>> behaviour can be achieved by combining the overlayfs
>>> "overlay.redirect" xattr with an read-only filesystem such as erofs.
>>>
>>> There are pros and cons to both these approaches, and the discussion
>>> about their respective value has sometimes been heated. We would like
>>> to have an in-person discussion at the summit, ideally also involving
>>> more of the filesystem development community, so that we can reach
>>> some consensus on what is the best apporach.
>>
>> In order to better understand the behaviour and requirements of the
>> overlayfs+erofs approach I spent some time implementing direct support
>> for erofs in libcomposefs. So, with current HEAD of
>> github.com/containers/composefs you can now do:
>>
>> $ mkcompose --digest-store=objects --format=erofs source-dir image.erofs
>>
>> This will produce an object store with the backing files, and a erofs
>> file with the required overlayfs xattrs, including a made up one
>> called "overlay.fs-verity" containing the expected fs-verity digest
>> for the lower dir. It also adds the required whiteouts to cover the
>> 00-ff dirs from the lower dir.
>>
>> These erofs files are ordered similarly to the composefs files, and we
>> give similar guarantees about their reproducibility, etc. So, they
>> should be apples-to-apples comparable with the composefs images.
>>
>> Given this, I ran another set of performance tests on the original cs9
>> rootfs dataset, again measuring the time of `ls -lR`. I also tried to
>> measure the memory use like this:
>>
>> # echo 3 > /proc/sys/vm/drop_caches
>> # systemd-run --scope sh -c 'ls -lR mountpoint' > /dev/null; cat $(cat
>> /proc/self/cgroup | sed -e "s|0::|/sys/fs/cgroup|")/memory.peak'
>>
>> These are the alternatives I tried:
>>
>> xfs: the source of the image, regular dir on xfs
>> erofs: the image.erofs above, on loopback
>> erofs dio: the image.erofs above, on loopback with --direct-io=on
>> ovl: erofs above combined with overlayfs
>> ovl dio: erofs dio above combined with overlayfs
>> cfs: composefs mount of image.cfs
>>
>> All tests use the same objects dir, stored on xfs. The erofs and
>> overlay implementations are from a stock 6.1.13 kernel, and composefs
>> module is from github HEAD.
>>
>> I tried loopback both with and without the direct-io option, because
>> without direct-io enabled the kernel will double-cache the loopbacked
>> data, as per[1].
>>
>> The produced images are:
>>   8.9M image.cfs
>> 11.3M image.erofs
>>
>> And gives these results:
>>             | Cold cache | Warm cache | Mem use
>>             |   (msec)   |   (msec)   |  (mb)
>> -----------+------------+------------+---------
>> xfs        |   1449     |    442     |    54
>> erofs      |    700     |    391     |    45
>> erofs dio  |    939     |    400     |    45
>> ovl        |   1827     |    530     |   130
>> ovl dio    |   2156     |    531     |   130
>> cfs        |    689     |    389     |    51
> 
> It has been noted that the readahead done by kernel_read() may cause
> read-ahead of unrelated data into memory which skews the results in
> favour of workloads that consume all the filesystem metadata (such as
> the ls -lR usecase of the above test). In the table above this favours
> composefs (which uses kernel_read in some codepaths) as well as
> non-dio erofs (non-dio loopback device uses readahead too).
> 
> I updated composefs to not use kernel_read here:
>    https://github.com/containers/composefs/pull/105
> 
> And a new kernel patch-set based on this is available at:
>    https://github.com/alexlarsson/linux/tree/composefs
> 
> The resulting table is now (dropping the non-dio erofs):
> 
>             | Cold cache | Warm cache | Mem use
>             |   (msec)   |   (msec)   |  (mb)
> -----------+------------+------------+---------
> xfs        |   1449     |    442     |   54
> erofs dio  |    939     |    400     |   45
> ovl dio    |   2156     |    531     |  130
> cfs        |    833     |    398     |   51
> 
>             | Cold cache | Warm cache | Mem use
>             |   (msec)   |   (msec)   |  (mb)
> -----------+------------+------------+---------
> ext4       |   1135     |    394     |   54
> erofs dio  |    922     |    401     |   45
> ovl dio    |   1810     |    532     |  149
> ovl lazy   |   1063     |    523     |  87
> cfs        |    768     |    459     |  51
> 
> So, while cfs is somewhat worse now for this particular usecase, my
> overall analysis still stands.

We will investigate it later, also you might still need to test some
other random workloads other than "ls -lR" (such as stat ~1000 files
randomly [1]) rather than completely ignore my and Jingbo's comments,
or at least you have to answer why "ls -lR" is the only judgement on
your side.

My point is simply simple.  If you consider a chance to get an
improved EROFS in some extents, we do hope we could improve your
"ls -lR" as much as possible without bad impacts to random access.
Or if you'd like to upstream a new file-based stackable filesystem
for this ostree specific use cases for your whatever KPIs anyway,
I don't think we could get some conclusion here and I cannot do any
help to you since I'm not that one.

Since you're addressing a very specific workload "ls -lR" and EROFS
as well as EROFS + overlayfs doesn't perform so bad without further
insights compared with Composefs even EROFS doesn't directly use
file-based interfaces.

Thanks,
Gao Xiang

[1] https://lore.kernel.org/r/83829005-3f12-afac-9d05-8ba721a80b4d@linux.alibaba.com

> 

  reply	other threads:[~2023-03-06 12:15 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27  9:22 [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay Alexander Larsson
2023-02-27 10:45 ` Gao Xiang
2023-02-27 10:58   ` Christian Brauner
2023-04-27 16:11     ` [Lsf-pc] " Amir Goldstein
2023-03-01  3:47   ` Jingbo Xu
2023-03-03 14:41     ` Alexander Larsson
2023-03-03 15:48       ` Gao Xiang
2023-02-27 11:37 ` Jingbo Xu
2023-03-03 13:57 ` Alexander Larsson
2023-03-03 15:13   ` Gao Xiang
2023-03-03 17:37     ` Gao Xiang
2023-03-04 14:59       ` Colin Walters
2023-03-04 15:29         ` Gao Xiang
2023-03-04 16:22           ` Gao Xiang
2023-03-07  1:00           ` Colin Walters
2023-03-07  3:10             ` Gao Xiang
2023-03-07 10:15     ` Christian Brauner
2023-03-07 11:03       ` Gao Xiang
2023-03-07 12:09       ` Alexander Larsson
2023-03-07 12:55         ` Gao Xiang
2023-03-07 15:16         ` Christian Brauner
2023-03-07 19:33           ` Giuseppe Scrivano
2023-03-08 10:31             ` Christian Brauner
2023-03-07 13:38       ` Jeff Layton
2023-03-08 10:37         ` Christian Brauner
2023-03-04  0:46   ` Jingbo Xu
2023-03-06 11:33   ` Alexander Larsson
2023-03-06 12:15     ` Gao Xiang [this message]
2023-03-06 15:49     ` Jingbo Xu
2023-03-06 16:09       ` Alexander Larsson
2023-03-06 16:17         ` Gao Xiang
2023-03-07  8:21           ` Alexander Larsson
2023-03-07  8:33             ` Gao Xiang
2023-03-07  8:48               ` Gao Xiang
2023-03-07  9:07               ` Alexander Larsson
2023-03-07  9:26                 ` Gao Xiang
2023-03-07  9:38                   ` Gao Xiang
2023-03-07  9:56                     ` Alexander Larsson
2023-03-07 10:06                       ` Gao Xiang
2023-03-07  9:46                   ` Alexander Larsson
2023-03-07 10:01                     ` Gao Xiang
2023-03-07 10:00       ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e595f640-53e2-cd6a-f169-e1567c4ff73b@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=alexl@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=gscrivan@redhat.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=miklos@szeredi.hu \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).