From: Hongzhen Luo <hongzhen@linux.alibaba.com>
To: Christian Brauner <brauner@kernel.org>,
Gao Xiang <xiang@kernel.org>, Jan Kara <jack@suse.cz>,
Amir Goldstein <amir73il@gmail.com>,
Jeff Layton <jlayton@kernel.org>,
Matthew Wilcox <willy@infradead.org>
Cc: "Daan De Meyer" <daan.j.demeyer@gmail.com>,
"Lennart Poettering" <lennart@poettering.net>,
"Mike Yuan" <me@yhndnzj.com>,
"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
lihongbo22@huawei.com, linux-erofs@lists.ozlabs.org
Subject: Re: [PATCH RFC 0/4] erofs: allow page cache sharing
Date: Sat, 5 Jul 2025 08:51:45 +0800 [thread overview]
Message-ID: <3452111b-991f-4be2-8ffd-1172a73feb2e@linux.alibaba.com> (raw)
In-Reply-To: <20250703-work-erofs-pcs-v1-0-0ce1f6be28ee@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 8515 bytes --]
On 2025/7/3 20:23, Christian Brauner wrote:
> Hey!
>
> This series is originally from Hongzhen. I'm picking it back up because
> support for page cache sharing is pretty important for container and
> service workloads that want to make use of erofs images. The main
> obstacle currently is the inability to share page cache contents between
> different erofs superblocks.
>
> I think the mechanism that Hongzhen came up with is decent and will
> remove one final obstacle.
>
> However, I have not worked in this area in meaningful ways before so to
> an experienced page cache person this might all look like a little kid
> doodling on a piece of paper.
>
> One obvious question mark I have is around mmap. The current
> implementation mimicks what overlayfs is doing and I'm not sure that
> it's correct or even necessary to mimick overlayfs behavior here at all.
>
> Anyway, I would really appreciate the help!
Hi Christian, glad to hear you're interested in my previous patch – and
please forgive my delayed
response, as I was swamped with other tasks. Finally catching up now
that it's the weekend. Due to
work change, I can no longer continue driving this patch series upstream.
This patch series seems to be outdated, and some of the implementations
are quite hacky. Please
take a look at the latest RFC patch series (v6):
https://lore.kernel.org/all/20250301145002.2420830-1-hongzhen@linux.alibaba.com/
> [Background]
> ============
> Currently, reading files with different paths (or names) but the same
> content will consume multiple copies of the page cache, even if the
> content of these page caches is the same. For example, reading identical
> files (e.g., *.so files) from two different minor versions of container
> images will cost multiple copies of the same page cache, since different
> containers have different mount points. Therefore, sharing the page cache
> for files with the same content can save memory.
>
> [Implementation]
> ================
> This introduces the page cache share feature in erofs. During the mkfs
> phase, the file content is hashed and the hash value is stored in the
> `trusted.erofs.fingerprint` extended attribute. Inodes of files with the
> same `trusted.erofs.fingerprint` are mapped to the same anonymous inode
> (indicated by the `ano_inode` field). When a read request occurs, the
> anonymous inode serves as a "container" whose page cache is shared. The
> actual operations involving the iomap are carried out by the original
> inode which is mapped to the anonymous inode.
>
> [Effect]
> ========
> I conducted experiments on two aspects across two different minor versions of
> container images:
>
> 1. reading all files in two different minor versions of container images
>
> 2. run workloads or use the default entrypoint within the containers^[1]
>
> Below is the memory usage for reading all files in two different minor
> versions of container images:
>
> +-------------------+------------------+-------------+---------------+
> | Image | Page Cache Share | Memory (MB) | Memory |
> | | | | Reduction (%) |
> +-------------------+------------------+-------------+---------------+
> | | No | 241 | - |
> | redis +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 163 | 33% |
> +-------------------+------------------+-------------+---------------+
> | | No | 872 | - |
> | postgres +------------------+-------------+---------------+
> | 16.1 & 16.2 | Yes | 630 | 28% |
> +-------------------+------------------+-------------+---------------+
> | | No | 2771 | - |
> | tensorflow +------------------+-------------+---------------+
> | 1.11.0 & 2.11.1 | Yes | 2340 | 16% |
> +-------------------+------------------+-------------+---------------+
> | | No | 926 | - |
> | mysql +------------------+-------------+---------------+
> | 8.0.11 & 8.0.12 | Yes | 735 | 21% |
> +-------------------+------------------+-------------+---------------+
> | | No | 390 | - |
> | nginx +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 219 | 44% |
> +-------------------+------------------+-------------+---------------+
> | tomcat | No | 924 | - |
> | 10.1.25 & 10.1.26 +------------------+-------------+---------------+
> | | Yes | 474 | 49% |
> +-------------------+------------------+-------------+---------------+
>
> Additionally, the table below shows the runtime memory usage of the
> container:
>
> +-------------------+------------------+-------------+---------------+
> | Image | Page Cache Share | Memory (MB) | Memory |
> | | | | Reduction (%) |
> +-------------------+------------------+-------------+---------------+
> | | No | 35 | - |
> | redis +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 28 | 20% |
> +-------------------+------------------+-------------+---------------+
> | | No | 149 | - |
> | postgres +------------------+-------------+---------------+
> | 16.1 & 16.2 | Yes | 95 | 37% |
> +-------------------+------------------+-------------+---------------+
> | | No | 1028 | - |
> | tensorflow +------------------+-------------+---------------+
> | 1.11.0 & 2.11.1 | Yes | 930 | 10% |
> +-------------------+------------------+-------------+---------------+
> | | No | 155 | - |
> | mysql +------------------+-------------+---------------+
> | 8.0.11 & 8.0.12 | Yes | 132 | 15% |
> +-------------------+------------------+-------------+---------------+
> | | No | 25 | - |
> | nginx +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 20 | 20% |
> +-------------------+------------------+-------------+---------------+
> | tomcat | No | 186 | - |
> | 10.1.25 & 10.1.26 +------------------+-------------+---------------+
> | | Yes | 98 | 48% |
> +-------------------+------------------+-------------+---------------+
>
> It can be observed that when reading all the files in the image, the reduced
> memory usage varies from 16% to 49%, depending on the specific image.
> Additionally, the container's runtime memory usage reduction ranges from 10%
> to 48%.
>
> [1] Below are the workload for these images:
> - redis: redis-benchmark
> - postgres: sysbench
> - tensorflow: app.py of tensorflow.python.platform
> - mysql: sysbench
> - nginx: wrk
> - tomcat: default entrypoint
>
> Signed-off-by: Christian Brauner<brauner@kernel.org>
> ---
> Hongzhen Luo (4):
> erofs: move `struct erofs_anon_fs_type` to super.c
> erofs: introduce page cache share feature
> erofs: apply the page cache share feature
> erofs: introduce .fadvise for page cache share
>
> fs/erofs/Kconfig | 10 ++
> fs/erofs/Makefile | 1 +
> fs/erofs/data.c | 67 +++++++++++
> fs/erofs/fscache.c | 13 ---
> fs/erofs/inode.c | 15 ++-
> fs/erofs/internal.h | 11 ++
> fs/erofs/pagecache_share.c | 281 +++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/pagecache_share.h | 22 ++++
> fs/erofs/super.c | 62 ++++++++++
> fs/erofs/zdata.c | 32 ++++++
> 10 files changed, 500 insertions(+), 14 deletions(-)
> ---
> base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
> change-id: 20250703-work-erofs-pcs-f6f3d0722401
>
[-- Attachment #2: Type: text/html, Size: 14857 bytes --]
prev parent reply other threads:[~2025-07-05 0:52 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 12:23 [PATCH RFC 0/4] erofs: allow page cache sharing Christian Brauner
2025-07-03 12:23 ` [PATCH RFC 1/4] erofs: move `struct erofs_anon_fs_type` to super.c Christian Brauner
2025-07-03 12:23 ` [PATCH RFC 2/4] erofs: introduce page cache share feature Christian Brauner
2025-07-04 21:06 ` Gao Xiang
2025-07-05 0:54 ` Hongzhen Luo
2025-07-05 8:25 ` Amir Goldstein
2025-07-05 10:58 ` Gao Xiang
2025-07-05 12:34 ` Amir Goldstein
2025-07-05 12:53 ` Gao Xiang
2025-07-05 13:53 ` Amir Goldstein
2025-07-05 15:14 ` Gao Xiang
2025-07-05 1:09 ` Hongzhen Luo
2025-07-03 12:23 ` [PATCH RFC 3/4] erofs: apply the " Christian Brauner
2025-07-04 20:45 ` Gao Xiang
2025-07-03 12:23 ` [PATCH RFC 4/4] erofs: introduce .fadvise for page cache share Christian Brauner
2025-07-04 21:09 ` Gao Xiang
2025-07-05 1:15 ` Hongzhen Luo
2025-07-05 1:25 ` Gao Xiang
2025-07-03 12:53 ` [PATCH RFC 0/4] erofs: allow page cache sharing Gao Xiang
2025-07-05 0:51 ` Hongzhen Luo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3452111b-991f-4be2-8ffd-1172a73feb2e@linux.alibaba.com \
--to=hongzhen@linux.alibaba.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=daan.j.demeyer@gmail.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=lennart@poettering.net \
--cc=lihongbo22@huawei.com \
--cc=linux-erofs@lists.ozlabs.org \
--cc=me@yhndnzj.com \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
--cc=zbyszek@in.waw.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.