linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Alexander Larsson <alexl@redhat.com>
Cc: Jingbo Xu <jefflexu@linux.alibaba.com>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	Amir Goldstein <amir73il@gmail.com>,
	Christian Brauner <brauner@kernel.org>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay
Date: Tue, 7 Mar 2023 18:01:35 +0800	[thread overview]
Message-ID: <1481097f-e534-8587-a86c-bdf22eea8946@linux.alibaba.com> (raw)
In-Reply-To: <CAL7ro1GWQvF+u9eChhDiBcm-YCWiWGSafHJezOSq5K2j-tQfrw@mail.gmail.com>



On 2023/3/7 17:46, Alexander Larsson wrote:
> On Tue, Mar 7, 2023 at 10:26 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>> On 2023/3/7 17:07, Alexander Larsson wrote:
>>> On Tue, Mar 7, 2023 at 9:34 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2023/3/7 16:21, Alexander Larsson wrote:
>>>>> On Mon, Mar 6, 2023 at 5:17 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>>>
>>>>>>>> I tested the performance of "ls -lR" on the whole tree of
>>>>>>>> cs9-developer-rootfs.  It seems that the performance of erofs (generated
>>>>>>>> from mkfs.erofs) is slightly better than that of composefs.  While the
>>>>>>>> performance of erofs generated from mkfs.composefs is slightly worse
>>>>>>>> that that of composefs.
>>>>>>>
>>>>>>> I suspect that the reason for the lower performance of mkfs.composefs
>>>>>>> is the added overlay.fs-verity xattr to all the files. It makes the
>>>>>>> image larger, and that means more i/o.
>>>>>>
>>>>>> Actually you could move overlay.fs-verity to EROFS shared xattr area (or
>>>>>> even overlay.redirect but it depends) if needed, which could save some
>>>>>> I/Os for your workloads.
>>>>>>
>>>>>> shared xattrs can be used in this way as well if you care such minor
>>>>>> difference, actually I think inlined xattrs for your workload are just
>>>>>> meaningful for selinux labels and capabilities.
>>>>>
>>>>> Really? Could you expand on this, because I would think it will be
>>>>> sort of the opposite. In my usecase, the erofs fs will be read by
>>>>> overlayfs, which will probably access overlay.* pretty often.  At the
>>>>> very least it will load overlay.metacopy and overlay.redirect for
>>>>> every lookup.
>>>>
>>>> Really.  In that way, it will behave much similiar to composefs on-disk
>>>> arrangement now (in composefs vdata area).
>>>>
>>>> Because in that way, although an extra I/O is needed for verification,
>>>> and it can only happen when actually opening the file (so "ls -lR" is
>>>> not impacted.) But on-disk inodes are more compact.
>>>>
>>>> All EROFS xattrs will be cached in memory so that accessing
>>>> overlay.* pretty often is not greatly impacted due to no real I/Os
>>>> (IOWs, only some CPU time is consumed).
>>>
>>> So, I tried moving the overlay.digest xattr to the shared area, but
>>> actually this made the performance worse for the ls case. I have not
>>
>> That is much strange.  We'd like to open it up if needed.  BTW, did you
>> test EROFS with acl enabled all the time?
> 
> These were all with acl enabled.
> 
> And, to test this, I compared "ls -lR" and "ls -ZR", which do the same
> per-file syscalls, except the later doesn't try to read the
> system.posix_acl_access xattr. The result is:
> 
> xattr:        inlined | not inlined
> ------------+---------+------------
> ls -lR cold |  708    |  721
> ls -lR warm |  415    |  412
> ls -ZR cold |  522    |  512
> ls -ZR warm |  283    |  279
> 
> In the ZR case the out-of band digest is a win, but not in the lR
> case, which seems to mean the failed lookup of the acl xattr is to
> blame here.
> 
> Also, very interesting is the fact that the warm cache difference for
> these to is so large. I guess that is because most other inode data is
> cached, but the xattrs lookups are not. If you could cache negative
> xattr lookups that seems like a large win. This can be either via a
> bloom cache in the disk format or maybe even just some in-memory
> negative lookup caches for the inode, maybe even special casing the
> acl xattrs.

Yes, agree.  Actually we don't take much time to look that ACL impacts
because almost all generic fses (such as ext4, XFS, btrfs, etc.) all
implement ACLs.  But you could use "-o noacl" to disable it if needed
with the current codebase.

> 
>>> looked into the cause in detail, but my guess is that ls looks for the
>>> acl xattr, and such a negative lookup will cause erofs to look at all
>>> the shared xattrs for the inode, which means they all end up being
>>> loaded anyway. Of course, this will only affect ls (or other cases
>>> that read the acl), so its perhaps a bit uncommon.
>>
>> Yeah, in addition to that, I guess real acls could be landed in inlined
>> xattrs as well if exists...
> 
> Yeah, but that doesn't help with the case where they don't exist.
> 
>> BTW, if you have more interest in this way, we could get in
>> touch in a more effective way to improve EROFS in addition to
>> community emails except for the userns stuff
> 
> I don't really have time to do any real erofs specific work. These are
> just some ideas that i got looking at these results.

I don't want you guys to do any EROFS-specific work.  I just want to
confirm your real requirement (so I can improve this) and the final
goal of this discussion.

At least, on my side after long time discussion and comparison.
EROFS and composefs are much similar (but when EROFS was raised we
don't have a better choice to get a good performance since you've
already partially benchmarked other fses) from many points of views
except for some interfaces, and since composefs doesn't implement
acl now, if you use "-o noacl" to mount EROFS, it could perform
better performance.  So I think it's no needed to discuss "ls -lR"
stuffs here anymore, if you disagree, we could take more time to
investigate on this.

In other words, EROFS on-disk format and loopback devices are not
performance bottlenack even on "ls -lR" workload.  We could improve
xattr negative lookups as a real input of this.

Thanks,
Gao Xiang

> 

  reply	other threads:[~2023-03-07 10:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27  9:22 [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay Alexander Larsson
2023-02-27 10:45 ` Gao Xiang
2023-02-27 10:58   ` Christian Brauner
2023-04-27 16:11     ` [Lsf-pc] " Amir Goldstein
2023-03-01  3:47   ` Jingbo Xu
2023-03-03 14:41     ` Alexander Larsson
2023-03-03 15:48       ` Gao Xiang
2023-02-27 11:37 ` Jingbo Xu
2023-03-03 13:57 ` Alexander Larsson
2023-03-03 15:13   ` Gao Xiang
2023-03-03 17:37     ` Gao Xiang
2023-03-04 14:59       ` Colin Walters
2023-03-04 15:29         ` Gao Xiang
2023-03-04 16:22           ` Gao Xiang
2023-03-07  1:00           ` Colin Walters
2023-03-07  3:10             ` Gao Xiang
2023-03-07 10:15     ` Christian Brauner
2023-03-07 11:03       ` Gao Xiang
2023-03-07 12:09       ` Alexander Larsson
2023-03-07 12:55         ` Gao Xiang
2023-03-07 15:16         ` Christian Brauner
2023-03-07 19:33           ` Giuseppe Scrivano
2023-03-08 10:31             ` Christian Brauner
2023-03-07 13:38       ` Jeff Layton
2023-03-08 10:37         ` Christian Brauner
2023-03-04  0:46   ` Jingbo Xu
2023-03-06 11:33   ` Alexander Larsson
2023-03-06 12:15     ` Gao Xiang
2023-03-06 15:49     ` Jingbo Xu
2023-03-06 16:09       ` Alexander Larsson
2023-03-06 16:17         ` Gao Xiang
2023-03-07  8:21           ` Alexander Larsson
2023-03-07  8:33             ` Gao Xiang
2023-03-07  8:48               ` Gao Xiang
2023-03-07  9:07               ` Alexander Larsson
2023-03-07  9:26                 ` Gao Xiang
2023-03-07  9:38                   ` Gao Xiang
2023-03-07  9:56                     ` Alexander Larsson
2023-03-07 10:06                       ` Gao Xiang
2023-03-07  9:46                   ` Alexander Larsson
2023-03-07 10:01                     ` Gao Xiang [this message]
2023-03-07 10:00       ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1481097f-e534-8587-a86c-bdf22eea8946@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=alexl@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=gscrivan@redhat.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=miklos@szeredi.hu \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).