* [PATCH v16 00/10] erofs: Introduce page cache sharing feature
@ 2026-01-22 13:37 Hongbo Li
2026-01-22 13:37 ` [PATCH v16 01/10] fs: Export alloc_empty_backing_file Hongbo Li
` (9 more replies)
0 siblings, 10 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
Enabling page cahe sharing in container scenarios has become increasingly
crucial, as it can significantly reduce memory usage. In previous efforts,
Hongzhen has done substantial work to push this feature into the EROFS
mainline. Due to other commitments, he hasn't been able to continue his
work recently, and I'm very pleased to build upon his work and continue
to refine this implementation.
This patch series is based on Hongzhen's original EROFS shared pagecache
implementation which was posted more than half a year ago:
https://lore.kernel.org/all/20250301145002.2420830-1-hongzhen@linux.alibaba.com/T/#u
I have already made several iterations based on this patch set, resolving
some issues in the code and some pre-requisites.
It should be noted that the two iomap pre-patches from the previous versions
have already been merged into the vfs/iomap branch, see [1][2]. Therefore,
the remaining patches here are mainly related to EROFS module.
(A recap of Hongzhen's original cover letter is below, edited slightly
for this serise:)
Background
==============
Currently, reading files with different paths (or names) but the same
content can consume multiple copies of the page cache, even if the
content of these caches is identical. For example, reading identical
files (e.g., *.so files) from two different minor versions of container
images can result in multiple copies of the same page cache, since
different containers have different mount points. Therefore, sharing
the page cache for files with the same content can save memory.
Proposal
==============
1. determining file identity
----------------------------
First, a way needs to be found to check whether the content of two files
is the same. Here, the xattr values associated with the file
fingerprints are assessed for consistency. When creating the EROFS
image, users can specify the name of the xattr for file fingerprints,
and the corresponding index will be stored in the super block. The on-disk
`ishare_xattr_prefix_id` indicates the index of the xattr item within the
prefix xattrs:
```
struct erofs_super_block {
__u8 xattr_filter_reserved; /* reserved for xattr name filter */
- __u8 reserved[3];
+ __u8 ishare_xattr_prefix_id;
+ __u8 reserved[2];
};
```
For example, users can specify the first long prefix as the name for the
file fingerprint as follows:
```
mkfs.erofs --xattr-inode-digest=trusted.erofs.fingerprint [-zlz4hc] foo.erofs foo/
```
In this way, `trusted.erofs.fingerprint` serves as the name of the xattr
for the file fingerprint. The relevant patch has been supported in erofs-utils
experimental branch:
```
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b experimental
```
At the same time, we introduce a new mount option which is inode_share to
enable the feature. For security reasons, we allow sharing page cache only
within the same trusted domain by adding "-o domain_id=xxxx" during the
mounting process:
```
mount -t erofs -o inode_share,domain_id=your_shared_domain_id erofs.img /mnt
```
If no domain ID is specified, page cache sharing is not allowed.
2. Implementation
==================
2.1. shared inode creation
When page cache sharing is enabled, the anon inode is created along with
the original inode if its xattr associated with fingerprint, and the anon
inode is called sharedinode. Other inode which has the same fingerprint
(means the same content) will link to the same sharedinode under the same
trusted domain. The page cache of the anon inode (i_mapping member) is the
shared page cache and is shared by the other inodes which have the same
fingerprint and under the same trusted domain.
2.2. file open & close
----------------------
When the file is opened, the backing file is allocated and the
->private_data field of file is set to the backing file. The backing
file records the shared inode and serves the later read proceedure.
When the actual read occurs, we can obtain the real inode and the
shared inode. The location information of real inode is used to
located the data in disk and the page cache of shared inode will
be filled.
When the file is close, the backing file is also released, and the
related reference on real inode and shared inode are also changed.
2.3. file reading
-----------------
Only the page cache of shared inode can be shared. When reading
happened on sharedinode, we should increase the reference of the
real inode to avoid the disk being released, then to decrease it
after reading.
There are two possible scenarios when reading a file:
1) the content being read is already present in sharedinode's page cache.
2) the content being read is not present in sharedinode's page cache.
In the second scenario, it involves the iomap operation to read from the
disk.
2.3.1. reading existing data in sharedinode's page cache
-------------------------------------------
In this case, the overall read flowchart is as follows (take ksys_read()
for example):
ksys_read
│
│
▼
...
│
│
▼
erofs_ishare_file_read_iter (switch to the backing file)
│
│
▼
read shared page cache & return
At this point, the content in sharedinode's page cache will be read
directly and returned.
2.3.2 reading non-existent content in sharedinode's page cache
---------------------------------------------------
In this case, disk I/O operations will be involved. Taking the reading
of an uncompressed file as an example, here is the reading process:
ksys_read
│
│
▼
...
│
│
▼
erofs_ishare_file_read_iter (switch to the backing file)
│
│
▼
... (allocate pages)
│
│
▼
erofs_read_folio/erofs_readahead (read to shared page cache)
│
│
▼
... (iomap)
│
│
▼
erofs_iomap_begin (located by real inode)
│
│
▼
...
Iomap and the below layer will involve disk I/O operations. As
described in 2.3, reads to the shared inode are not bound to
specific filesystem instance, it will select an real backing erofs
inode from the shared list to complete the I/Os.
2.4. release shared page cache
-----------------------
Similar to overlayfs, when dropping the shared page cache via .fadvise,
erofs locates the shared backing file and applies vfs_fadvise to release
the shared page cache.
Effect
==================
I conducted experiments on two aspects across two different minor
versions of container images:
1. reading all files in two different minor versions of container images
2. run workloads or use the default entrypoint within the containers^[I]
Below is the memory usage for reading all files in two different minor
versions of container images:
+-------------------+------------------+-------------+---------------+
| Image | Page Cache Share | Memory (MB) | Memory |
| | | | Reduction (%) |
+-------------------+------------------+-------------+---------------+
| | No | 241 | - |
| redis +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 163 | 33% |
+-------------------+------------------+-------------+---------------+
| | No | 872 | - |
| postgres +------------------+-------------+---------------+
| 16.1 & 16.2 | Yes | 630 | 28% |
+-------------------+------------------+-------------+---------------+
| | No | 2771 | - |
| tensorflow +------------------+-------------+---------------+
| 2.11.0 & 2.11.1 | Yes | 2340 | 16% |
+-------------------+------------------+-------------+---------------+
| | No | 926 | - |
| mysql +------------------+-------------+---------------+
| 8.0.11 & 8.0.12 | Yes | 735 | 21% |
+-------------------+------------------+-------------+---------------+
| | No | 390 | - |
| nginx +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 219 | 44% |
+-------------------+------------------+-------------+---------------+
| tomcat | No | 924 | - |
| 10.1.25 & 10.1.26 +------------------+-------------+---------------+
| | Yes | 474 | 49% |
+-------------------+------------------+-------------+---------------+
Additionally, the table below shows the runtime memory usage of the
container:
+-------------------+------------------+-------------+---------------+
| Image | Page Cache Share | Memory (MB) | Memory |
| | | | Reduction (%) |
+-------------------+------------------+-------------+---------------+
| | No | 34.9 | - |
| redis +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 33.6 | 4% |
+-------------------+------------------+-------------+---------------+
| | No | 149.1 | - |
| postgres +------------------+-------------+---------------+
| 16.1 & 16.2 | Yes | 95 | 37% |
+-------------------+------------------+-------------+---------------+
| | No | 1027.9 | - |
| tensorflow +------------------+-------------+---------------+
| 2.11.0 & 2.11.1 | Yes | 934.3 | 10% |
+-------------------+------------------+-------------+---------------+
| | No | 155.0 | - |
| mysql +------------------+-------------+---------------+
| 8.0.11 & 8.0.12 | Yes | 139.1 | 11% |
+-------------------+------------------+-------------+---------------+
| | No | 25.4 | - |
| nginx +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 18.8 | 26% |
+-------------------+------------------+-------------+---------------+
| tomcat | No | 186 | - |
| 10.1.25 & 10.1.26 +------------------+-------------+---------------+
| | Yes | 99 | 47% |
+-------------------+------------------+-------------+---------------+
It can be observed that when reading all the files in the image, the
reduced memory usage varies from 16% to 49%, depending on the specific
image. Additionally, the container's runtime memory usage reduction
ranges from 4% to 47%.
[I] Below are the workload for these images:
- redis: redis-benchmark
- postgres: sysbench
- tensorflow: app.py of tensorflow.python.platform
- mysql: sysbench
- nginx: wrk
- tomcat: default entrypoint
Changes from v15:
- Patch 4: add erofs_inode_set_aops and use IS_ENABLED in seperated patch as
suggested by Christoph.
- Patch 5: use safer way on domain_id: alloc/free, not show to userspace
in sharing case and update notation in doc as suggested by Xiang.
- Patch 6: use #ifdef as suggested by Christoph and don't allow empty
domain_id when inode_share is on as suggested by Xiang.
- Patch 10: remove extra pointer cast as suggested by Christoph.
Changes from v14:
- Patch 5: add erofs_inode_set_aops helper to simplify the code and add log
when INODE_SHARE is on as suggested by Xiang. Add inode_drop when
sharedinode is an orphan and skip fill fingerprint when xattr is not ready.
- Patch 6: new added one, to pass inode into tracepoint helper.
- Patch 7: move tracepoint related changes out and simplify the code
as suggested by Xiang.
- Patch 8: the compressed related one, add reviewed-by.
Changes from v13:
- Patch 7: do some minor cleanup as suggested by Xiang.
- Patch 8,9: use open-code style as suggested by Xiang and pass the
realinode to trace_erofs_read_folio.
Changes from v12:
- Patch 5: add reviewed-by.
- Patch 7: only allow non-direct I/O in open for sharing feature, mask
INODE_SHARE if sb without ishare_xattrs, simplify the code and better
naming as suggested by Xiang.
- Patch 8: remove unuse macro as suggested by Xiang.
- Patch 9: minor cleanup as suggested by Xiang.
Changes from v11:
- Patch 4: apply with Xiang's patch.
- Patch 5: do not mask the xattr_prefix_id in disk and fix the compiling
error when disable XATTR config.
- Patch 6,10: add reviewed-by.
- Patch 7,8: make inode_share excluded with DAX feature, do
some cleanup on typo and other code-style as suggested by Xiang.
- Patch 9: using realinode and shareinode in compressed case to access
metadata and page cache seperately, and remove some useless
code as suggested by Xiang.
Changes from v10:
- add reviewed-by and acked-by.
- do some cleanup on typo, useless code and some helpers' name.
- use fingerprint struct and introduce inode_share mount option as
suggested by Xiang.
Changes from v9:
- make shared page cache as a compatiable feature.
- refine code style as suggested by Xiang.
- init ishare mnt during the module init as suggested by Xiang.
- rebase the latest mainline and fix the comments in cover letter.
Changes from v8:
- add review-by in patch 1 and patch 10.
- do some clean up in patch 2 and patch 4,6,9 as suggested by Xiang.
- add new patch 3 to export alloc_empty_backing_file.
- patch 5 only use xattr prefix id to record the ishare info, changed
config to EROFS_FS_PAGE_CACHE_SHARE and make it compatible.
- patch 7 use backing file helpers to alloc file when ishare file is
opened as suggested by Xiang.
- patch 8 remove erofs_read_{begin,end} as suggested by Xiang.
v15: https://lore.kernel.org/all/20260116095550.627082-1-lihongbo22@huawei.com/
v14: https://lore.kernel.org/all/20260109102856.598531-1-lihongbo22@huawei.com/
v13: https://lore.kernel.org/all/20260109030140.594936-1-lihongbo22@huawei.com/
v12: https://lore.kernel.org/all/20251231090118.541061-1-lihongbo22@huawei.com/
v11: https://lore.kernel.org/all/20251224040932.496478-1-lihongbo22@huawei.com/
v10: https://lore.kernel.org/all/20251223015618.485626-1-lihongbo22@huawei.com/
v9: https://lore.kernel.org/all/20251117132537.227116-1-lihongbo22@huawei.com/
v8: https://lore.kernel.org/all/20251114095516.207555-1-lihongbo22@huawei.com/
v7: https://lore.kernel.org/all/20251021104815.70662-1-lihongbo22@huawei.com/
v6: https://lore.kernel.org/all/20250301145002.2420830-1-hongzhen@linux.alibaba.com/T/#u
v5: https://lore.kernel.org/all/20250105151208.3797385-1-hongzhen@linux.alibaba.com/
v4: https://lore.kernel.org/all/20240902110620.2202586-1-hongzhen@linux.alibaba.com/
v3: https://lore.kernel.org/all/20240828111959.3677011-1-hongzhen@linux.alibaba.com/
v2: https://lore.kernel.org/all/20240731080704.678259-1-hongzhen@linux.alibaba.com/
v1: https://lore.kernel.org/all/20240722065355.1396365-1-hongzhen@linux.alibaba.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?id=8806f279244b
[2] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?id=8d407bb32186
Gao Xiang (1):
erofs: decouple `struct erofs_anon_fs_type`
Hongbo Li (5):
fs: Export alloc_empty_backing_file
erofs: add erofs_inode_set_aops helper to set the aops.
erofs: using domain_id in the safer way
erofs: pass inode to trace_erofs_read_folio
erofs: support unencoded inodes for page cache share
Hongzhen Luo (4):
erofs: support user-defined fingerprint name
erofs: introduce the page cache share feature
erofs: support compressed inodes for page cache share
erofs: implement .fadvise for page cache share
Documentation/filesystems/erofs.rst | 10 +-
fs/erofs/Kconfig | 9 ++
fs/erofs/Makefile | 1 +
fs/erofs/data.c | 36 +++--
fs/erofs/erofs_fs.h | 5 +-
fs/erofs/fileio.c | 25 ++--
fs/erofs/fscache.c | 17 +--
fs/erofs/inode.c | 27 +---
fs/erofs/internal.h | 64 +++++++++
fs/erofs/ishare.c | 206 ++++++++++++++++++++++++++++
fs/erofs/super.c | 89 +++++++++++-
fs/erofs/xattr.c | 47 +++++++
fs/erofs/xattr.h | 3 +
fs/erofs/zdata.c | 38 +++--
fs/file_table.c | 1 +
include/trace/events/erofs.h | 10 +-
16 files changed, 501 insertions(+), 87 deletions(-)
create mode 100644 fs/erofs/ishare.c
--
2.22.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v16 01/10] fs: Export alloc_empty_backing_file
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 02/10] erofs: decouple `struct erofs_anon_fs_type` Hongbo Li
` (8 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
There is no need to open nonexistent real files if backing files
couldn't be backed by real files (e.g., EROFS page cache sharing
doesn't need typical real files to open again).
Therefore, we export the alloc_empty_backing_file() helper, allowing
filesystems to dynamically set the backing file without real file
open. This is particularly useful for obtaining the correct @path
and @inode when calling file_user_path() and file_user_inode().
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
---
fs/file_table.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/file_table.c b/fs/file_table.c
index cd4a3db4659a..476edfe7d8f5 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -308,6 +308,7 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT;
return &ff->file;
}
+EXPORT_SYMBOL_GPL(alloc_empty_backing_file);
/**
* file_init_path - initialize a 'struct file' based on path
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 02/10] erofs: decouple `struct erofs_anon_fs_type`
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
2026-01-22 13:37 ` [PATCH v16 01/10] fs: Export alloc_empty_backing_file Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 03/10] erofs: support user-defined fingerprint name Hongbo Li
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
From: Gao Xiang <hsiangkao@linux.alibaba.com>
- Move the `struct erofs_anon_fs_type` to super.c and expose it
in preparation for the upcoming page cache share feature;
- Remove the `.owner` field, as they are all internal mounts and
fully managed by EROFS. Retaining `.owner` would unnecessarily
increment module reference counts, preventing the EROFS kernel
module from being unloaded.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
fs/erofs/fscache.c | 13 -------------
fs/erofs/internal.h | 2 ++
fs/erofs/super.c | 14 ++++++++++++++
3 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 7a346e20f7b7..f4937b025038 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -3,7 +3,6 @@
* Copyright (C) 2022, Alibaba Cloud
* Copyright (C) 2022, Bytedance Inc. All rights reserved.
*/
-#include <linux/pseudo_fs.h>
#include <linux/fscache.h>
#include "internal.h"
@@ -13,18 +12,6 @@ static LIST_HEAD(erofs_domain_list);
static LIST_HEAD(erofs_domain_cookies_list);
static struct vfsmount *erofs_pseudo_mnt;
-static int erofs_anon_init_fs_context(struct fs_context *fc)
-{
- return init_pseudo(fc, EROFS_SUPER_MAGIC) ? 0 : -ENOMEM;
-}
-
-static struct file_system_type erofs_anon_fs_type = {
- .owner = THIS_MODULE,
- .name = "pseudo_erofs",
- .init_fs_context = erofs_anon_init_fs_context,
- .kill_sb = kill_anon_super,
-};
-
struct erofs_fscache_io {
struct netfs_cache_resources cres;
struct iov_iter iter;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f7f622836198..98fe652aea33 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -188,6 +188,8 @@ static inline bool erofs_is_fileio_mode(struct erofs_sb_info *sbi)
return IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) && sbi->dif0.file;
}
+extern struct file_system_type erofs_anon_fs_type;
+
static inline bool erofs_is_fscache_mode(struct super_block *sb)
{
return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) &&
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 937a215f626c..f18f43b78fca 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -11,6 +11,7 @@
#include <linux/fs_parser.h>
#include <linux/exportfs.h>
#include <linux/backing-dev.h>
+#include <linux/pseudo_fs.h>
#include "xattr.h"
#define CREATE_TRACE_POINTS
@@ -936,6 +937,19 @@ static struct file_system_type erofs_fs_type = {
};
MODULE_ALIAS_FS("erofs");
+#if defined(CONFIG_EROFS_FS_ONDEMAND)
+static int erofs_anon_init_fs_context(struct fs_context *fc)
+{
+ return init_pseudo(fc, EROFS_SUPER_MAGIC) ? 0 : -ENOMEM;
+}
+
+struct file_system_type erofs_anon_fs_type = {
+ .name = "pseudo_erofs",
+ .init_fs_context = erofs_anon_init_fs_context,
+ .kill_sb = kill_anon_super,
+};
+#endif
+
static int __init erofs_module_init(void)
{
int err;
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 03/10] erofs: support user-defined fingerprint name
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
2026-01-22 13:37 ` [PATCH v16 01/10] fs: Export alloc_empty_backing_file Hongbo Li
2026-01-22 13:37 ` [PATCH v16 02/10] erofs: decouple `struct erofs_anon_fs_type` Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops Hongbo Li
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
From: Hongzhen Luo <hongzhen@linux.alibaba.com>
When creating the EROFS image, users can specify the fingerprint name.
This is to prepare for the upcoming inode page cache share.
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
fs/erofs/Kconfig | 9 +++++++++
fs/erofs/erofs_fs.h | 5 +++--
fs/erofs/internal.h | 2 ++
fs/erofs/super.c | 9 +++++++++
fs/erofs/xattr.c | 13 +++++++++++++
5 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index d81f3318417d..b71f2a8074fe 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -194,3 +194,12 @@ config EROFS_FS_PCPU_KTHREAD_HIPRI
at higher priority.
If unsure, say N.
+
+config EROFS_FS_PAGE_CACHE_SHARE
+ bool "EROFS page cache share support (experimental)"
+ depends on EROFS_FS && EROFS_FS_XATTR && !EROFS_FS_ONDEMAND
+ help
+ This enables page cache sharing among inodes with identical
+ content fingerprints on the same machine.
+
+ If unsure, say N.
diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index e24268acdd62..b30a74d307c5 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -17,7 +17,7 @@
#define EROFS_FEATURE_COMPAT_XATTR_FILTER 0x00000004
#define EROFS_FEATURE_COMPAT_SHARED_EA_IN_METABOX 0x00000008
#define EROFS_FEATURE_COMPAT_PLAIN_XATTR_PFX 0x00000010
-
+#define EROFS_FEATURE_COMPAT_ISHARE_XATTRS 0x00000020
/*
* Any bits that aren't in EROFS_ALL_FEATURE_INCOMPAT should
@@ -83,7 +83,8 @@ struct erofs_super_block {
__le32 xattr_prefix_start; /* start of long xattr prefixes */
__le64 packed_nid; /* nid of the special packed inode */
__u8 xattr_filter_reserved; /* reserved for xattr name filter */
- __u8 reserved[3];
+ __u8 ishare_xattr_prefix_id;
+ __u8 reserved[2];
__le32 build_time; /* seconds added to epoch for mkfs time */
__le64 rootnid_8b; /* (48BIT on) nid of root directory */
__le64 reserved2;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 98fe652aea33..ec79e8b44d3b 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -134,6 +134,7 @@ struct erofs_sb_info {
u32 xattr_blkaddr;
u32 xattr_prefix_start;
u8 xattr_prefix_count;
+ u8 ishare_xattr_prefix_id;
struct erofs_xattr_prefix_item *xattr_prefixes;
unsigned int xattr_filter_reserved;
#endif
@@ -238,6 +239,7 @@ EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
EROFS_FEATURE_FUNCS(shared_ea_in_metabox, compat, COMPAT_SHARED_EA_IN_METABOX)
EROFS_FEATURE_FUNCS(plain_xattr_pfx, compat, COMPAT_PLAIN_XATTR_PFX)
+EROFS_FEATURE_FUNCS(ishare_xattrs, compat, COMPAT_ISHARE_XATTRS)
static inline u64 erofs_nid_to_ino64(struct erofs_sb_info *sbi, erofs_nid_t nid)
{
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index f18f43b78fca..dca1445f6c92 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -320,6 +320,15 @@ static int erofs_read_superblock(struct super_block *sb)
sbi->xattr_prefix_start = le32_to_cpu(dsb->xattr_prefix_start);
sbi->xattr_prefix_count = dsb->xattr_prefix_count;
sbi->xattr_filter_reserved = dsb->xattr_filter_reserved;
+ if (erofs_sb_has_ishare_xattrs(sbi)) {
+ if (dsb->ishare_xattr_prefix_id >= sbi->xattr_prefix_count) {
+ erofs_err(sb, "invalid ishare xattr prefix id %u",
+ dsb->ishare_xattr_prefix_id);
+ ret = -EFSCORRUPTED;
+ goto out;
+ }
+ sbi->ishare_xattr_prefix_id = dsb->ishare_xattr_prefix_id;
+ }
#endif
sbi->islotbits = ilog2(sizeof(struct erofs_inode_compact));
if (erofs_sb_has_48bit(sbi) && dsb->rootnid_8b) {
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index 396536d9a862..ae61f20cb861 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -519,6 +519,19 @@ int erofs_xattr_prefixes_init(struct super_block *sb)
}
erofs_put_metabuf(&buf);
+ if (!ret && erofs_sb_has_ishare_xattrs(sbi)) {
+ struct erofs_xattr_prefix_item *pf = pfs + sbi->ishare_xattr_prefix_id;
+ struct erofs_xattr_long_prefix *newpfx;
+
+ newpfx = krealloc(pf->prefix,
+ sizeof(*newpfx) + pf->infix_len + 1, GFP_KERNEL);
+ if (newpfx) {
+ newpfx->infix[pf->infix_len] = '\0';
+ pf->prefix = newpfx;
+ } else {
+ ret = -ENOMEM;
+ }
+ }
sbi->xattr_prefixes = pfs;
if (ret)
erofs_xattr_prefixes_cleanup(sb);
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops.
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
` (2 preceding siblings ...)
2026-01-22 13:37 ` [PATCH v16 03/10] erofs: support user-defined fingerprint name Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:54 ` Gao Xiang
2026-01-22 13:37 ` [PATCH v16 05/10] erofs: using domain_id in the safer way Hongbo Li
` (5 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
Add erofs_inode_set_aops helper to set the inode->i_mapping->a_ops,
and using IS_ENABLED to make it cleaner.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
---
fs/erofs/inode.c | 23 +----------------------
fs/erofs/internal.h | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+), 22 deletions(-)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index bce98c845a18..389632bb46c4 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -235,28 +235,7 @@ static int erofs_fill_inode(struct inode *inode)
}
mapping_set_large_folios(inode->i_mapping);
- if (erofs_inode_is_data_compressed(vi->datalayout)) {
-#ifdef CONFIG_EROFS_FS_ZIP
- DO_ONCE_LITE_IF(inode->i_blkbits != PAGE_SHIFT,
- erofs_info, inode->i_sb,
- "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!");
- inode->i_mapping->a_ops = &z_erofs_aops;
-#else
- err = -EOPNOTSUPP;
-#endif
- } else {
- inode->i_mapping->a_ops = &erofs_aops;
-#ifdef CONFIG_EROFS_FS_ONDEMAND
- if (erofs_is_fscache_mode(inode->i_sb))
- inode->i_mapping->a_ops = &erofs_fscache_access_aops;
-#endif
-#ifdef CONFIG_EROFS_FS_BACKED_BY_FILE
- if (erofs_is_fileio_mode(EROFS_SB(inode->i_sb)))
- inode->i_mapping->a_ops = &erofs_fileio_aops;
-#endif
- }
-
- return err;
+ return erofs_inode_set_aops(inode, inode, false);
}
/*
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index ec79e8b44d3b..8e28c2fa8735 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -455,6 +455,29 @@ static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count)
return NULL;
}
+static inline int erofs_inode_set_aops(struct inode *inode,
+ struct inode *realinode, bool no_fscache)
+{
+ if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) {
+ if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP))
+ return -EOPNOTSUPP;
+ DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT,
+ erofs_info, realinode->i_sb,
+ "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!");
+ inode->i_mapping->a_ops = &z_erofs_aops;
+ return 0;
+ }
+ inode->i_mapping->a_ops = &erofs_aops;
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND)) {
+ if (!no_fscache && erofs_is_fscache_mode(realinode->i_sb))
+ inode->i_mapping->a_ops = &erofs_fscache_access_aops;
+ } else {
+ if (erofs_is_fileio_mode(EROFS_SB(realinode->i_sb)))
+ inode->i_mapping->a_ops = &erofs_fileio_aops;
+ }
+ return 0;
+}
+
int erofs_register_sysfs(struct super_block *sb);
void erofs_unregister_sysfs(struct super_block *sb);
int __init erofs_init_sysfs(void);
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 05/10] erofs: using domain_id in the safer way
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
` (3 preceding siblings ...)
2026-01-22 13:37 ` [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 06/10] erofs: introduce the page cache share feature Hongbo Li
` (4 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
Either the existing fscache usecase or the upcoming page
cache sharing case, the `domain_id` should be protected as
sensitive information, so we use the safer helpers to allocate,
free and display domain_id.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
---
Documentation/filesystems/erofs.rst | 5 +++--
fs/erofs/fscache.c | 4 ++--
fs/erofs/super.c | 8 ++++----
3 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
index 08194f194b94..40dbf3b6a35f 100644
--- a/Documentation/filesystems/erofs.rst
+++ b/Documentation/filesystems/erofs.rst
@@ -126,8 +126,9 @@ dax={always,never} Use direct access (no page cache). See
dax A legacy option which is an alias for ``dax=always``.
device=%s Specify a path to an extra device to be used together.
fsid=%s Specify a filesystem image ID for Fscache back-end.
-domain_id=%s Specify a domain ID in fscache mode so that different images
- with the same blobs under a given domain ID can share storage.
+domain_id=%s Specify a trusted domain ID for fscache mode so that
+ different images with the same blobs, identified by blob IDs,
+ can share storage within the same trusted domain.
fsoffset=%llu Specify block-aligned filesystem offset for the primary device.
=================== =========================================================
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index f4937b025038..a2cc0f3fa9d0 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -379,7 +379,7 @@ static void erofs_fscache_domain_put(struct erofs_domain *domain)
}
fscache_relinquish_volume(domain->volume, NULL, false);
mutex_unlock(&erofs_domain_list_lock);
- kfree(domain->domain_id);
+ kfree_sensitive(domain->domain_id);
kfree(domain);
return;
}
@@ -446,7 +446,7 @@ static int erofs_fscache_init_domain(struct super_block *sb)
sbi->domain = domain;
return 0;
out:
- kfree(domain->domain_id);
+ kfree_sensitive(domain->domain_id);
kfree(domain);
return err;
}
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index dca1445f6c92..6fbe9220303a 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -525,8 +525,8 @@ static int erofs_fc_parse_param(struct fs_context *fc,
return -ENOMEM;
break;
case Opt_domain_id:
- kfree(sbi->domain_id);
- sbi->domain_id = kstrdup(param->string, GFP_KERNEL);
+ kfree_sensitive(sbi->domain_id);
+ sbi->domain_id = no_free_ptr(param->string);
if (!sbi->domain_id)
return -ENOMEM;
break;
@@ -624,7 +624,7 @@ static void erofs_set_sysfs_name(struct super_block *sb)
{
struct erofs_sb_info *sbi = EROFS_SB(sb);
- if (sbi->domain_id)
+ if (sbi->domain_id && sbi->fsid)
super_set_sysfs_name_generic(sb, "%s,%s", sbi->domain_id,
sbi->fsid);
else if (sbi->fsid)
@@ -852,7 +852,7 @@ static void erofs_sb_free(struct erofs_sb_info *sbi)
{
erofs_free_dev_context(sbi->devs);
kfree(sbi->fsid);
- kfree(sbi->domain_id);
+ kfree_sensitive(sbi->domain_id);
if (sbi->dif0.file)
fput(sbi->dif0.file);
kfree(sbi->volume_name);
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 06/10] erofs: introduce the page cache share feature
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
` (4 preceding siblings ...)
2026-01-22 13:37 ` [PATCH v16 05/10] erofs: using domain_id in the safer way Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 14:01 ` Gao Xiang
2026-01-22 13:37 ` [PATCH v16 07/10] erofs: pass inode to trace_erofs_read_folio Hongbo Li
` (3 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
From: Hongzhen Luo <hongzhen@linux.alibaba.com>
Currently, reading files with different paths (or names) but the same
content will consume multiple copies of the page cache, even if the
content of these page caches is the same. For example, reading
identical files (e.g., *.so files) from two different minor versions of
container images will cost multiple copies of the same page cache,
since different containers have different mount points. Therefore,
sharing the page cache for files with the same content can save memory.
This introduces the page cache share feature in erofs. It allocate a
shared inode and use its page cache as shared. Reads for files
with identical content will ultimately be routed to the page cache of
the shared inode. In this way, a single page cache satisfies
multiple read requests for different files with the same contents.
We introduce new mount option `inode_share` to enable the page
sharing mode during mounting. This option is used in conjunction
with `domain_id` to share the page cache within the same trusted
domain.
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
---
Documentation/filesystems/erofs.rst | 5 +
fs/erofs/Makefile | 1 +
fs/erofs/inode.c | 1 -
fs/erofs/internal.h | 31 ++++++
fs/erofs/ishare.c | 167 ++++++++++++++++++++++++++++
fs/erofs/super.c | 62 ++++++++++-
fs/erofs/xattr.c | 34 ++++++
fs/erofs/xattr.h | 3 +
8 files changed, 301 insertions(+), 3 deletions(-)
create mode 100644 fs/erofs/ishare.c
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
index 40dbf3b6a35f..bfef8e87f299 100644
--- a/Documentation/filesystems/erofs.rst
+++ b/Documentation/filesystems/erofs.rst
@@ -129,7 +129,12 @@ fsid=%s Specify a filesystem image ID for Fscache back-end.
domain_id=%s Specify a trusted domain ID for fscache mode so that
different images with the same blobs, identified by blob IDs,
can share storage within the same trusted domain.
+ Also used for different filesystems with inode page sharing
+ enabled to share page cache within the trusted domain.
fsoffset=%llu Specify block-aligned filesystem offset for the primary device.
+inode_share Enable inode page sharing for this filesystem. Inodes with
+ identical content within the same domain ID can share the
+ page cache.
=================== =========================================================
Sysfs Entries
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 549abc424763..a80e1762b607 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -10,3 +10,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
erofs-$(CONFIG_EROFS_FS_ZIP_ACCEL) += decompressor_crypto.o
erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
+erofs-$(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) += ishare.o
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 389632bb46c4..202cbbb4eada 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -203,7 +203,6 @@ static int erofs_read_inode(struct inode *inode)
static int erofs_fill_inode(struct inode *inode)
{
- struct erofs_inode *vi = EROFS_I(inode);
int err;
trace_erofs_fill_inode(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 8e28c2fa8735..1061faf43868 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -179,6 +179,7 @@ struct erofs_sb_info {
#define EROFS_MOUNT_DAX_ALWAYS 0x00000040
#define EROFS_MOUNT_DAX_NEVER 0x00000080
#define EROFS_MOUNT_DIRECT_IO 0x00000100
+#define EROFS_MOUNT_INODE_SHARE 0x00000200
#define clear_opt(opt, option) ((opt)->mount_opt &= ~EROFS_MOUNT_##option)
#define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option)
@@ -269,6 +270,11 @@ static inline u64 erofs_nid_to_ino64(struct erofs_sb_info *sbi, erofs_nid_t nid)
/* default readahead size of directories */
#define EROFS_DIR_RA_BYTES 16384
+struct erofs_inode_fingerprint {
+ u8 *opaque;
+ int size;
+};
+
struct erofs_inode {
erofs_nid_t nid;
@@ -304,6 +310,18 @@ struct erofs_inode {
};
#endif /* CONFIG_EROFS_FS_ZIP */
};
+#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE
+ struct list_head ishare_list;
+ union {
+ /* for each anon shared inode */
+ struct {
+ struct erofs_inode_fingerprint fingerprint;
+ spinlock_t ishare_lock;
+ };
+ /* for each real inode */
+ struct inode *sharedinode;
+ };
+#endif
/* the corresponding vfs inode */
struct inode vfs_inode;
};
@@ -410,6 +428,7 @@ extern const struct inode_operations erofs_dir_iops;
extern const struct file_operations erofs_file_fops;
extern const struct file_operations erofs_dir_fops;
+extern const struct file_operations erofs_ishare_fops;
extern const struct iomap_ops z_erofs_iomap_report_ops;
@@ -564,6 +583,18 @@ static inline struct bio *erofs_fscache_bio_alloc(struct erofs_map_dev *mdev) {
static inline void erofs_fscache_submit_bio(struct bio *bio) {}
#endif
+#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE
+int __init erofs_init_ishare(void);
+void erofs_exit_ishare(void);
+bool erofs_ishare_fill_inode(struct inode *inode);
+void erofs_ishare_free_inode(struct inode *inode);
+#else
+static inline int erofs_init_ishare(void) { return 0; }
+static inline void erofs_exit_ishare(void) {}
+static inline bool erofs_ishare_fill_inode(struct inode *inode) { return false; }
+static inline void erofs_ishare_free_inode(struct inode *inode) {}
+#endif
+
long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
long erofs_compat_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg);
diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c
new file mode 100644
index 000000000000..3d26b2826710
--- /dev/null
+++ b/fs/erofs/ishare.c
@@ -0,0 +1,167 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2024, Alibaba Cloud
+ */
+#include <linux/xxhash.h>
+#include <linux/mount.h>
+#include "internal.h"
+#include "xattr.h"
+
+#include "../internal.h"
+
+static struct vfsmount *erofs_ishare_mnt;
+
+static int erofs_ishare_iget5_eq(struct inode *inode, void *data)
+{
+ struct erofs_inode_fingerprint *fp1 = &EROFS_I(inode)->fingerprint;
+ struct erofs_inode_fingerprint *fp2 = data;
+
+ return fp1->size == fp2->size &&
+ !memcmp(fp1->opaque, fp2->opaque, fp2->size);
+}
+
+static int erofs_ishare_iget5_set(struct inode *inode, void *data)
+{
+ struct erofs_inode *vi = EROFS_I(inode);
+
+ vi->fingerprint = *(struct erofs_inode_fingerprint *)data;
+ INIT_LIST_HEAD(&vi->ishare_list);
+ spin_lock_init(&vi->ishare_lock);
+ return 0;
+}
+
+bool erofs_ishare_fill_inode(struct inode *inode)
+{
+ struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);
+ struct erofs_inode *vi = EROFS_I(inode);
+ struct erofs_inode_fingerprint fp;
+ struct inode *sharedinode;
+ unsigned long hash;
+
+ if (erofs_xattr_fill_inode_fingerprint(&fp, inode, sbi->domain_id))
+ return false;
+ hash = xxh32(fp.opaque, fp.size, 0);
+ sharedinode = iget5_locked(erofs_ishare_mnt->mnt_sb, hash,
+ erofs_ishare_iget5_eq, erofs_ishare_iget5_set,
+ &fp);
+ if (!sharedinode) {
+ kfree(fp.opaque);
+ return false;
+ }
+
+ if (inode_state_read_once(sharedinode) & I_NEW) {
+ if (erofs_inode_set_aops(sharedinode, inode, true)) {
+ iget_failed(sharedinode);
+ kfree(fp.opaque);
+ return false;
+ }
+ sharedinode->i_size = vi->vfs_inode.i_size;
+ unlock_new_inode(sharedinode);
+ } else {
+ kfree(fp.opaque);
+ if (sharedinode->i_size != vi->vfs_inode.i_size) {
+ _erofs_printk(inode->i_sb, KERN_WARNING
+ "size(%lld:%lld) not matches for the same fingerprint\n",
+ vi->vfs_inode.i_size, sharedinode->i_size);
+ iput(sharedinode);
+ return false;
+ }
+ }
+ vi->sharedinode = sharedinode;
+ INIT_LIST_HEAD(&vi->ishare_list);
+ spin_lock(&EROFS_I(sharedinode)->ishare_lock);
+ list_add(&vi->ishare_list, &EROFS_I(sharedinode)->ishare_list);
+ spin_unlock(&EROFS_I(sharedinode)->ishare_lock);
+ return true;
+}
+
+void erofs_ishare_free_inode(struct inode *inode)
+{
+ struct erofs_inode *vi = EROFS_I(inode);
+ struct inode *sharedinode = vi->sharedinode;
+
+ if (!sharedinode)
+ return;
+ spin_lock(&EROFS_I(sharedinode)->ishare_lock);
+ list_del(&vi->ishare_list);
+ spin_unlock(&EROFS_I(sharedinode)->ishare_lock);
+ iput(sharedinode);
+ vi->sharedinode = NULL;
+}
+
+static int erofs_ishare_file_open(struct inode *inode, struct file *file)
+{
+ struct inode *sharedinode = EROFS_I(inode)->sharedinode;
+ struct file *realfile;
+
+ if (file->f_flags & O_DIRECT)
+ return -EINVAL;
+ realfile = alloc_empty_backing_file(O_RDONLY|O_NOATIME, current_cred());
+ if (IS_ERR(realfile))
+ return PTR_ERR(realfile);
+ ihold(sharedinode);
+ realfile->f_op = &erofs_file_fops;
+ realfile->f_inode = sharedinode;
+ realfile->f_mapping = sharedinode->i_mapping;
+ path_get(&file->f_path);
+ backing_file_set_user_path(realfile, &file->f_path);
+
+ file_ra_state_init(&realfile->f_ra, file->f_mapping);
+ realfile->private_data = EROFS_I(inode);
+ file->private_data = realfile;
+ return 0;
+}
+
+static int erofs_ishare_file_release(struct inode *inode, struct file *file)
+{
+ struct file *realfile = file->private_data;
+
+ iput(realfile->f_inode);
+ fput(realfile);
+ file->private_data = NULL;
+ return 0;
+}
+
+static ssize_t erofs_ishare_file_read_iter(struct kiocb *iocb,
+ struct iov_iter *to)
+{
+ struct file *realfile = iocb->ki_filp->private_data;
+ struct kiocb dedup_iocb;
+ ssize_t nread;
+
+ if (!iov_iter_count(to))
+ return 0;
+ kiocb_clone(&dedup_iocb, iocb, realfile);
+ nread = filemap_read(&dedup_iocb, to, 0);
+ iocb->ki_pos = dedup_iocb.ki_pos;
+ return nread;
+}
+
+static int erofs_ishare_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct file *realfile = file->private_data;
+
+ vma_set_file(vma, realfile);
+ return generic_file_readonly_mmap(file, vma);
+}
+
+const struct file_operations erofs_ishare_fops = {
+ .open = erofs_ishare_file_open,
+ .llseek = generic_file_llseek,
+ .read_iter = erofs_ishare_file_read_iter,
+ .mmap = erofs_ishare_mmap,
+ .release = erofs_ishare_file_release,
+ .get_unmapped_area = thp_get_unmapped_area,
+ .splice_read = filemap_splice_read,
+};
+
+int __init erofs_init_ishare(void)
+{
+ erofs_ishare_mnt = kern_mount(&erofs_anon_fs_type);
+ return PTR_ERR_OR_ZERO(erofs_ishare_mnt);
+}
+
+void erofs_exit_ishare(void)
+{
+ kern_unmount(erofs_ishare_mnt);
+}
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 6fbe9220303a..32b57e78c9e0 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -396,6 +396,7 @@ static void erofs_default_options(struct erofs_sb_info *sbi)
enum {
Opt_user_xattr, Opt_acl, Opt_cache_strategy, Opt_dax, Opt_dax_enum,
Opt_device, Opt_fsid, Opt_domain_id, Opt_directio, Opt_fsoffset,
+ Opt_inode_share,
};
static const struct constant_table erofs_param_cache_strategy[] = {
@@ -423,6 +424,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
fsparam_string("domain_id", Opt_domain_id),
fsparam_flag_no("directio", Opt_directio),
fsparam_u64("fsoffset", Opt_fsoffset),
+ fsparam_flag("inode_share", Opt_inode_share),
{}
};
@@ -524,6 +526,8 @@ static int erofs_fc_parse_param(struct fs_context *fc,
if (!sbi->fsid)
return -ENOMEM;
break;
+#endif
+#if defined(CONFIG_EROFS_FS_ONDEMAND) || defined(CONFIG_EROFS_FS_PAGE_CACHE_SHARE)
case Opt_domain_id:
kfree_sensitive(sbi->domain_id);
sbi->domain_id = no_free_ptr(param->string);
@@ -549,6 +553,13 @@ static int erofs_fc_parse_param(struct fs_context *fc,
case Opt_fsoffset:
sbi->dif0.fsoff = result.uint_64;
break;
+ case Opt_inode_share:
+#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE
+ set_opt(&sbi->opt, INODE_SHARE);
+#else
+ errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name);
+#endif
+ break;
}
return 0;
}
@@ -647,6 +658,15 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_maxbytes = MAX_LFS_FILESIZE;
sb->s_op = &erofs_sops;
+ if (!sbi->domain_id && test_opt(&sbi->opt, INODE_SHARE)) {
+ errorfc(fc, "domain_id is needed when inode_ishare is on");
+ return -EINVAL;
+ }
+ if (test_opt(&sbi->opt, DAX_ALWAYS) && test_opt(&sbi->opt, INODE_SHARE)) {
+ errorfc(fc, "FSDAX is not allowed when inode_ishare is on");
+ return -EINVAL;
+ }
+
sbi->blkszbits = PAGE_SHIFT;
if (!sb->s_bdev) {
/*
@@ -717,6 +737,12 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
erofs_info(sb, "unsupported blocksize for DAX");
clear_opt(&sbi->opt, DAX_ALWAYS);
}
+ if (test_opt(&sbi->opt, INODE_SHARE) && !erofs_sb_has_ishare_xattrs(sbi)) {
+ erofs_info(sb, "on-disk ishare xattrs not found. Turning off inode_share.");
+ clear_opt(&sbi->opt, INODE_SHARE);
+ }
+ if (test_opt(&sbi->opt, INODE_SHARE))
+ erofs_info(sb, "EXPERIMENTAL EROFS page cache share support in use. Use at your own risk!");
sb->s_time_gran = 1;
sb->s_xattr = erofs_xattr_handlers;
@@ -946,10 +972,32 @@ static struct file_system_type erofs_fs_type = {
};
MODULE_ALIAS_FS("erofs");
-#if defined(CONFIG_EROFS_FS_ONDEMAND)
+#if defined(CONFIG_EROFS_FS_ONDEMAND) || defined(CONFIG_EROFS_FS_PAGE_CACHE_SHARE)
+static void erofs_free_anon_inode(struct inode *inode)
+{
+ struct erofs_inode *vi = EROFS_I(inode);
+
+#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE
+ kfree(vi->fingerprint.opaque);
+#endif
+ kmem_cache_free(erofs_inode_cachep, vi);
+}
+
+static const struct super_operations erofs_anon_sops = {
+ .alloc_inode = erofs_alloc_inode,
+ .drop_inode = inode_just_drop,
+ .free_inode = erofs_free_anon_inode,
+};
+
static int erofs_anon_init_fs_context(struct fs_context *fc)
{
- return init_pseudo(fc, EROFS_SUPER_MAGIC) ? 0 : -ENOMEM;
+ struct pseudo_fs_context *ctx;
+
+ ctx = init_pseudo(fc, EROFS_SUPER_MAGIC);
+ if (!ctx)
+ return -ENOMEM;
+ ctx->ops = &erofs_anon_sops;
+ return 0;
}
struct file_system_type erofs_anon_fs_type = {
@@ -984,6 +1032,10 @@ static int __init erofs_module_init(void)
if (err)
goto sysfs_err;
+ err = erofs_init_ishare();
+ if (err)
+ goto ishare_err;
+
err = register_filesystem(&erofs_fs_type);
if (err)
goto fs_err;
@@ -991,6 +1043,8 @@ static int __init erofs_module_init(void)
return 0;
fs_err:
+ erofs_exit_ishare();
+ishare_err:
erofs_exit_sysfs();
sysfs_err:
z_erofs_exit_subsystem();
@@ -1008,6 +1062,7 @@ static void __exit erofs_module_exit(void)
/* Ensure all RCU free inodes / pclusters are safe to be destroyed. */
rcu_barrier();
+ erofs_exit_ishare();
erofs_exit_sysfs();
z_erofs_exit_subsystem();
erofs_exit_shrinker();
@@ -1062,6 +1117,8 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
#endif
if (sbi->dif0.fsoff)
seq_printf(seq, ",fsoffset=%llu", sbi->dif0.fsoff);
+ if (test_opt(opt, INODE_SHARE))
+ seq_puts(seq, ",inode_share");
return 0;
}
@@ -1072,6 +1129,7 @@ static void erofs_evict_inode(struct inode *inode)
dax_break_layout_final(inode);
#endif
+ erofs_ishare_free_inode(inode);
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
}
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index ae61f20cb861..e1709059d3cc 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -577,3 +577,37 @@ struct posix_acl *erofs_get_acl(struct inode *inode, int type, bool rcu)
return acl;
}
#endif
+
+#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE
+int erofs_xattr_fill_inode_fingerprint(struct erofs_inode_fingerprint *fp,
+ struct inode *inode, const char *domain_id)
+{
+ struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);
+ struct erofs_xattr_prefix_item *prefix;
+ const char *infix;
+ int valuelen, base_index;
+
+ if (!test_opt(&sbi->opt, INODE_SHARE))
+ return -EOPNOTSUPP;
+ if (!sbi->xattr_prefixes)
+ return -EINVAL;
+ prefix = sbi->xattr_prefixes + sbi->ishare_xattr_prefix_id;
+ infix = prefix->prefix->infix;
+ base_index = prefix->prefix->base_index;
+ valuelen = erofs_getxattr(inode, base_index, infix, NULL, 0);
+ if (valuelen <= 0 || valuelen > (1 << sbi->blkszbits))
+ return -EFSCORRUPTED;
+ fp->size = valuelen + (domain_id ? strlen(domain_id) : 0);
+ fp->opaque = kmalloc(fp->size, GFP_KERNEL);
+ if (!fp->opaque)
+ return -ENOMEM;
+ if (valuelen != erofs_getxattr(inode, base_index, infix,
+ fp->opaque, valuelen)) {
+ kfree(fp->opaque);
+ fp->opaque = NULL;
+ return -EFSCORRUPTED;
+ }
+ memcpy(fp->opaque + valuelen, domain_id, fp->size - valuelen);
+ return 0;
+}
+#endif
diff --git a/fs/erofs/xattr.h b/fs/erofs/xattr.h
index 6317caa8413e..bf75a580b8f1 100644
--- a/fs/erofs/xattr.h
+++ b/fs/erofs/xattr.h
@@ -67,4 +67,7 @@ struct posix_acl *erofs_get_acl(struct inode *inode, int type, bool rcu);
#define erofs_get_acl (NULL)
#endif
+int erofs_xattr_fill_inode_fingerprint(struct erofs_inode_fingerprint *fp,
+ struct inode *inode, const char *domain_id);
+
#endif
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 07/10] erofs: pass inode to trace_erofs_read_folio
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
` (5 preceding siblings ...)
2026-01-22 13:37 ` [PATCH v16 06/10] erofs: introduce the page cache share feature Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 08/10] erofs: support unencoded inodes for page cache share Hongbo Li
` (2 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
The trace_erofs_read_folio accesses inode information through folio,
but this method fails if the real inode is not associated with the
folio(such as for the uncomping page cache sharing case). Therefore,
we pass the real inode to it so that the inode information can be
printed out in that case.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
---
fs/erofs/data.c | 6 ++----
fs/erofs/fileio.c | 2 +-
fs/erofs/zdata.c | 2 +-
include/trace/events/erofs.h | 10 +++++-----
4 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 71e23d91123d..ea198defb531 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -385,8 +385,7 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
};
struct erofs_iomap_iter_ctx iter_ctx = {};
- trace_erofs_read_folio(folio, true);
-
+ trace_erofs_read_folio(folio_inode(folio), folio, true);
iomap_read_folio(&erofs_iomap_ops, &read_ctx, &iter_ctx);
return 0;
}
@@ -400,8 +399,7 @@ static void erofs_readahead(struct readahead_control *rac)
struct erofs_iomap_iter_ctx iter_ctx = {};
trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
- readahead_count(rac), true);
-
+ readahead_count(rac), true);
iomap_readahead(&erofs_iomap_ops, &read_ctx, &iter_ctx);
}
diff --git a/fs/erofs/fileio.c b/fs/erofs/fileio.c
index 932e8b353ba1..d07dc248d264 100644
--- a/fs/erofs/fileio.c
+++ b/fs/erofs/fileio.c
@@ -161,7 +161,7 @@ static int erofs_fileio_read_folio(struct file *file, struct folio *folio)
struct erofs_fileio io = {};
int err;
- trace_erofs_read_folio(folio, true);
+ trace_erofs_read_folio(folio_inode(folio), folio, true);
err = erofs_fileio_scan_folio(&io, folio);
erofs_fileio_rq_submit(io.rq);
return err;
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 3d31f7840ca0..93ab6a481b64 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1887,7 +1887,7 @@ static int z_erofs_read_folio(struct file *file, struct folio *folio)
Z_EROFS_DEFINE_FRONTEND(f, inode, folio_pos(folio));
int err;
- trace_erofs_read_folio(folio, false);
+ trace_erofs_read_folio(inode, folio, false);
z_erofs_pcluster_readmore(&f, NULL, true);
err = z_erofs_scan_folio(&f, folio, false);
z_erofs_pcluster_readmore(&f, NULL, false);
diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
index dad7360f42f9..def20d06507b 100644
--- a/include/trace/events/erofs.h
+++ b/include/trace/events/erofs.h
@@ -82,9 +82,9 @@ TRACE_EVENT(erofs_fill_inode,
TRACE_EVENT(erofs_read_folio,
- TP_PROTO(struct folio *folio, bool raw),
+ TP_PROTO(struct inode *inode, struct folio *folio, bool raw),
- TP_ARGS(folio, raw),
+ TP_ARGS(inode, folio, raw),
TP_STRUCT__entry(
__field(dev_t, dev )
@@ -96,9 +96,9 @@ TRACE_EVENT(erofs_read_folio,
),
TP_fast_assign(
- __entry->dev = folio->mapping->host->i_sb->s_dev;
- __entry->nid = EROFS_I(folio->mapping->host)->nid;
- __entry->dir = S_ISDIR(folio->mapping->host->i_mode);
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->nid = EROFS_I(inode)->nid;
+ __entry->dir = S_ISDIR(inode->i_mode);
__entry->index = folio->index;
__entry->uptodate = folio_test_uptodate(folio);
__entry->raw = raw;
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 08/10] erofs: support unencoded inodes for page cache share
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
` (6 preceding siblings ...)
2026-01-22 13:37 ` [PATCH v16 07/10] erofs: pass inode to trace_erofs_read_folio Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 09/10] erofs: support compressed " Hongbo Li
2026-01-22 13:37 ` [PATCH v16 10/10] erofs: implement .fadvise " Hongbo Li
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
This patch adds inode page cache sharing functionality for unencoded
files.
I conducted experiments in the container environment. Below is the
memory usage for reading all files in two different minor versions
of container images:
+-------------------+------------------+-------------+---------------+
| Image | Page Cache Share | Memory (MB) | Memory |
| | | | Reduction (%) |
+-------------------+------------------+-------------+---------------+
| | No | 241 | - |
| redis +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 163 | 33% |
+-------------------+------------------+-------------+---------------+
| | No | 872 | - |
| postgres +------------------+-------------+---------------+
| 16.1 & 16.2 | Yes | 630 | 28% |
+-------------------+------------------+-------------+---------------+
| | No | 2771 | - |
| tensorflow +------------------+-------------+---------------+
| 2.11.0 & 2.11.1 | Yes | 2340 | 16% |
+-------------------+------------------+-------------+---------------+
| | No | 926 | - |
| mysql +------------------+-------------+---------------+
| 8.0.11 & 8.0.12 | Yes | 735 | 21% |
+-------------------+------------------+-------------+---------------+
| | No | 390 | - |
| nginx +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 219 | 44% |
+-------------------+------------------+-------------+---------------+
| tomcat | No | 924 | - |
| 10.1.25 & 10.1.26 +------------------+-------------+---------------+
| | Yes | 474 | 49% |
+-------------------+------------------+-------------+---------------+
Additionally, the table below shows the runtime memory usage of the
container:
+-------------------+------------------+-------------+---------------+
| Image | Page Cache Share | Memory (MB) | Memory |
| | | | Reduction (%) |
+-------------------+------------------+-------------+---------------+
| | No | 35 | - |
| redis +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 28 | 20% |
+-------------------+------------------+-------------+---------------+
| | No | 149 | - |
| postgres +------------------+-------------+---------------+
| 16.1 & 16.2 | Yes | 95 | 37% |
+-------------------+------------------+-------------+---------------+
| | No | 1028 | - |
| tensorflow +------------------+-------------+---------------+
| 2.11.0 & 2.11.1 | Yes | 930 | 10% |
+-------------------+------------------+-------------+---------------+
| | No | 155 | - |
| mysql +------------------+-------------+---------------+
| 8.0.11 & 8.0.12 | Yes | 132 | 15% |
+-------------------+------------------+-------------+---------------+
| | No | 25 | - |
| nginx +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 20 | 20% |
+-------------------+------------------+-------------+---------------+
| tomcat | No | 186 | - |
| 10.1.25 & 10.1.26 +------------------+-------------+---------------+
| | Yes | 98 | 48% |
+-------------------+------------------+-------------+---------------+
Co-developed-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
---
fs/erofs/data.c | 32 +++++++++++++++++++++++---------
fs/erofs/fileio.c | 25 ++++++++++++++++---------
fs/erofs/inode.c | 3 ++-
fs/erofs/internal.h | 6 ++++++
fs/erofs/ishare.c | 34 ++++++++++++++++++++++++++++++++++
5 files changed, 81 insertions(+), 19 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index ea198defb531..3a4eb0dececd 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -269,6 +269,7 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
struct erofs_iomap_iter_ctx {
struct page *page;
void *base;
+ struct inode *realinode;
};
static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
@@ -276,14 +277,15 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
{
struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
struct erofs_iomap_iter_ctx *ctx = iter->private;
- struct super_block *sb = inode->i_sb;
+ struct inode *realinode = ctx ? ctx->realinode : inode;
+ struct super_block *sb = realinode->i_sb;
struct erofs_map_blocks map;
struct erofs_map_dev mdev;
int ret;
map.m_la = offset;
map.m_llen = length;
- ret = erofs_map_blocks(inode, &map);
+ ret = erofs_map_blocks(realinode, &map);
if (ret < 0)
return ret;
@@ -296,7 +298,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
return 0;
}
- if (!(map.m_flags & EROFS_MAP_META) || !erofs_inode_in_metabox(inode)) {
+ if (!(map.m_flags & EROFS_MAP_META) || !erofs_inode_in_metabox(realinode)) {
mdev = (struct erofs_map_dev) {
.m_deviceid = map.m_deviceid,
.m_pa = map.m_pa,
@@ -322,7 +324,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
void *ptr;
ptr = erofs_read_metabuf(&buf, sb, map.m_pa,
- erofs_inode_in_metabox(inode));
+ erofs_inode_in_metabox(realinode));
if (IS_ERR(ptr))
return PTR_ERR(ptr);
iomap->inline_data = ptr;
@@ -383,10 +385,15 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
.ops = &iomap_bio_read_ops,
.cur_folio = folio,
};
- struct erofs_iomap_iter_ctx iter_ctx = {};
+ bool need_iput;
+ struct erofs_iomap_iter_ctx iter_ctx = {
+ .realinode = erofs_real_inode(folio_inode(folio), &need_iput),
+ };
- trace_erofs_read_folio(folio_inode(folio), folio, true);
+ trace_erofs_read_folio(iter_ctx.realinode, folio, true);
iomap_read_folio(&erofs_iomap_ops, &read_ctx, &iter_ctx);
+ if (need_iput)
+ iput(iter_ctx.realinode);
return 0;
}
@@ -396,11 +403,16 @@ static void erofs_readahead(struct readahead_control *rac)
.ops = &iomap_bio_read_ops,
.rac = rac,
};
- struct erofs_iomap_iter_ctx iter_ctx = {};
+ bool need_iput;
+ struct erofs_iomap_iter_ctx iter_ctx = {
+ .realinode = erofs_real_inode(rac->mapping->host, &need_iput),
+ };
- trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
+ trace_erofs_readahead(iter_ctx.realinode, readahead_index(rac),
readahead_count(rac), true);
iomap_readahead(&erofs_iomap_ops, &read_ctx, &iter_ctx);
+ if (need_iput)
+ iput(iter_ctx.realinode);
}
static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
@@ -421,7 +433,9 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
return dax_iomap_rw(iocb, to, &erofs_iomap_ops);
#endif
if ((iocb->ki_flags & IOCB_DIRECT) && inode->i_sb->s_bdev) {
- struct erofs_iomap_iter_ctx iter_ctx = {};
+ struct erofs_iomap_iter_ctx iter_ctx = {
+ .realinode = inode,
+ };
return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
NULL, 0, &iter_ctx, 0);
diff --git a/fs/erofs/fileio.c b/fs/erofs/fileio.c
index d07dc248d264..c1d0081609dc 100644
--- a/fs/erofs/fileio.c
+++ b/fs/erofs/fileio.c
@@ -88,9 +88,9 @@ void erofs_fileio_submit_bio(struct bio *bio)
bio));
}
-static int erofs_fileio_scan_folio(struct erofs_fileio *io, struct folio *folio)
+static int erofs_fileio_scan_folio(struct erofs_fileio *io,
+ struct inode *inode, struct folio *folio)
{
- struct inode *inode = folio_inode(folio);
struct erofs_map_blocks *map = &io->map;
unsigned int cur = 0, end = folio_size(folio), len, attached = 0;
loff_t pos = folio_pos(folio), ofs;
@@ -158,31 +158,38 @@ static int erofs_fileio_scan_folio(struct erofs_fileio *io, struct folio *folio)
static int erofs_fileio_read_folio(struct file *file, struct folio *folio)
{
+ bool need_iput;
+ struct inode *realinode = erofs_real_inode(folio_inode(folio), &need_iput);
struct erofs_fileio io = {};
int err;
- trace_erofs_read_folio(folio_inode(folio), folio, true);
- err = erofs_fileio_scan_folio(&io, folio);
+ trace_erofs_read_folio(realinode, folio, true);
+ err = erofs_fileio_scan_folio(&io, realinode, folio);
erofs_fileio_rq_submit(io.rq);
+ if (need_iput)
+ iput(realinode);
return err;
}
static void erofs_fileio_readahead(struct readahead_control *rac)
{
- struct inode *inode = rac->mapping->host;
+ bool need_iput;
+ struct inode *realinode = erofs_real_inode(rac->mapping->host, &need_iput);
struct erofs_fileio io = {};
struct folio *folio;
int err;
- trace_erofs_readahead(inode, readahead_index(rac),
+ trace_erofs_readahead(realinode, readahead_index(rac),
readahead_count(rac), true);
while ((folio = readahead_folio(rac))) {
- err = erofs_fileio_scan_folio(&io, folio);
+ err = erofs_fileio_scan_folio(&io, realinode, folio);
if (err && err != -EINTR)
- erofs_err(inode->i_sb, "readahead error at folio %lu @ nid %llu",
- folio->index, EROFS_I(inode)->nid);
+ erofs_err(realinode->i_sb, "readahead error at folio %lu @ nid %llu",
+ folio->index, EROFS_I(realinode)->nid);
}
erofs_fileio_rq_submit(io.rq);
+ if (need_iput)
+ iput(realinode);
}
const struct address_space_operations erofs_fileio_aops = {
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 202cbbb4eada..d33816cff813 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -213,7 +213,8 @@ static int erofs_fill_inode(struct inode *inode)
switch (inode->i_mode & S_IFMT) {
case S_IFREG:
inode->i_op = &erofs_generic_iops;
- inode->i_fop = &erofs_file_fops;
+ inode->i_fop = erofs_ishare_fill_inode(inode) ?
+ &erofs_ishare_fops : &erofs_file_fops;
break;
case S_IFDIR:
inode->i_op = &erofs_dir_iops;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 1061faf43868..ec6959e22732 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -588,11 +588,17 @@ int __init erofs_init_ishare(void);
void erofs_exit_ishare(void);
bool erofs_ishare_fill_inode(struct inode *inode);
void erofs_ishare_free_inode(struct inode *inode);
+struct inode *erofs_real_inode(struct inode *inode, bool *need_iput);
#else
static inline int erofs_init_ishare(void) { return 0; }
static inline void erofs_exit_ishare(void) {}
static inline bool erofs_ishare_fill_inode(struct inode *inode) { return false; }
static inline void erofs_ishare_free_inode(struct inode *inode) {}
+static inline struct inode *erofs_real_inode(struct inode *inode, bool *need_iput)
+{
+ *need_iput = false;
+ return inode;
+}
#endif
long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c
index 3d26b2826710..ab459fb62473 100644
--- a/fs/erofs/ishare.c
+++ b/fs/erofs/ishare.c
@@ -11,6 +11,12 @@
static struct vfsmount *erofs_ishare_mnt;
+static inline bool erofs_is_ishare_inode(struct inode *inode)
+{
+ /* assumed FS_ONDEMAND is excluded with FS_PAGE_CACHE_SHARE feature */
+ return inode->i_sb->s_type == &erofs_anon_fs_type;
+}
+
static int erofs_ishare_iget5_eq(struct inode *inode, void *data)
{
struct erofs_inode_fingerprint *fp1 = &EROFS_I(inode)->fingerprint;
@@ -38,6 +44,8 @@ bool erofs_ishare_fill_inode(struct inode *inode)
struct inode *sharedinode;
unsigned long hash;
+ if (erofs_inode_is_data_compressed(vi->datalayout))
+ return false;
if (erofs_xattr_fill_inode_fingerprint(&fp, inode, sbi->domain_id))
return false;
hash = xxh32(fp.opaque, fp.size, 0);
@@ -155,6 +163,32 @@ const struct file_operations erofs_ishare_fops = {
.splice_read = filemap_splice_read,
};
+struct inode *erofs_real_inode(struct inode *inode, bool *need_iput)
+{
+ struct erofs_inode *vi, *vi_share;
+ struct inode *realinode;
+
+ *need_iput = false;
+ if (!erofs_is_ishare_inode(inode))
+ return inode;
+
+ vi_share = EROFS_I(inode);
+ spin_lock(&vi_share->ishare_lock);
+ /* fetch any one as real inode */
+ DBG_BUGON(list_empty(&vi_share->ishare_list));
+ list_for_each_entry(vi, &vi_share->ishare_list, ishare_list) {
+ realinode = igrab(&vi->vfs_inode);
+ if (realinode) {
+ *need_iput = true;
+ break;
+ }
+ }
+ spin_unlock(&vi_share->ishare_lock);
+
+ DBG_BUGON(!realinode);
+ return realinode;
+}
+
int __init erofs_init_ishare(void)
{
erofs_ishare_mnt = kern_mount(&erofs_anon_fs_type);
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 09/10] erofs: support compressed inodes for page cache share
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
` (7 preceding siblings ...)
2026-01-22 13:37 ` [PATCH v16 08/10] erofs: support unencoded inodes for page cache share Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 10/10] erofs: implement .fadvise " Hongbo Li
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
From: Hongzhen Luo <hongzhen@linux.alibaba.com>
This patch adds page cache sharing functionality for compressed inodes.
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
fs/erofs/ishare.c | 2 --
fs/erofs/zdata.c | 38 ++++++++++++++++++++++++--------------
2 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c
index ab459fb62473..ad53a57dbcbc 100644
--- a/fs/erofs/ishare.c
+++ b/fs/erofs/ishare.c
@@ -44,8 +44,6 @@ bool erofs_ishare_fill_inode(struct inode *inode)
struct inode *sharedinode;
unsigned long hash;
- if (erofs_inode_is_data_compressed(vi->datalayout))
- return false;
if (erofs_xattr_fill_inode_fingerprint(&fp, inode, sbi->domain_id))
return false;
hash = xxh32(fp.opaque, fp.size, 0);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 93ab6a481b64..59ee9a36d9eb 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -493,7 +493,7 @@ enum z_erofs_pclustermode {
};
struct z_erofs_frontend {
- struct inode *const inode;
+ struct inode *inode, *sharedinode;
struct erofs_map_blocks map;
struct z_erofs_bvec_iter biter;
@@ -508,8 +508,8 @@ struct z_erofs_frontend {
unsigned int icur;
};
-#define Z_EROFS_DEFINE_FRONTEND(fe, i, ho) struct z_erofs_frontend fe = { \
- .inode = i, .head = Z_EROFS_PCLUSTER_TAIL, \
+#define Z_EROFS_DEFINE_FRONTEND(fe, i, si, ho) struct z_erofs_frontend fe = { \
+ .inode = i, .sharedinode = si, .head = Z_EROFS_PCLUSTER_TAIL, \
.mode = Z_EROFS_PCLUSTER_FOLLOWED, .headoffset = ho }
static bool z_erofs_should_alloc_cache(struct z_erofs_frontend *fe)
@@ -1866,7 +1866,7 @@ static void z_erofs_pcluster_readmore(struct z_erofs_frontend *f,
pgoff_t index = cur >> PAGE_SHIFT;
struct folio *folio;
- folio = erofs_grab_folio_nowait(inode->i_mapping, index);
+ folio = erofs_grab_folio_nowait(f->sharedinode->i_mapping, index);
if (!IS_ERR_OR_NULL(folio)) {
if (folio_test_uptodate(folio))
folio_unlock(folio);
@@ -1883,11 +1883,13 @@ static void z_erofs_pcluster_readmore(struct z_erofs_frontend *f,
static int z_erofs_read_folio(struct file *file, struct folio *folio)
{
- struct inode *const inode = folio->mapping->host;
- Z_EROFS_DEFINE_FRONTEND(f, inode, folio_pos(folio));
+ struct inode *sharedinode = folio->mapping->host;
+ bool need_iput;
+ struct inode *realinode = erofs_real_inode(sharedinode, &need_iput);
+ Z_EROFS_DEFINE_FRONTEND(f, realinode, sharedinode, folio_pos(folio));
int err;
- trace_erofs_read_folio(inode, folio, false);
+ trace_erofs_read_folio(realinode, folio, false);
z_erofs_pcluster_readmore(&f, NULL, true);
err = z_erofs_scan_folio(&f, folio, false);
z_erofs_pcluster_readmore(&f, NULL, false);
@@ -1896,23 +1898,28 @@ static int z_erofs_read_folio(struct file *file, struct folio *folio)
/* if some pclusters are ready, need submit them anyway */
err = z_erofs_runqueue(&f, 0) ?: err;
if (err && err != -EINTR)
- erofs_err(inode->i_sb, "read error %d @ %lu of nid %llu",
- err, folio->index, EROFS_I(inode)->nid);
+ erofs_err(realinode->i_sb, "read error %d @ %lu of nid %llu",
+ err, folio->index, EROFS_I(realinode)->nid);
erofs_put_metabuf(&f.map.buf);
erofs_release_pages(&f.pagepool);
+
+ if (need_iput)
+ iput(realinode);
return err;
}
static void z_erofs_readahead(struct readahead_control *rac)
{
- struct inode *const inode = rac->mapping->host;
- Z_EROFS_DEFINE_FRONTEND(f, inode, readahead_pos(rac));
+ struct inode *sharedinode = rac->mapping->host;
+ bool need_iput;
+ struct inode *realinode = erofs_real_inode(sharedinode, &need_iput);
+ Z_EROFS_DEFINE_FRONTEND(f, realinode, sharedinode, readahead_pos(rac));
unsigned int nrpages = readahead_count(rac);
struct folio *head = NULL, *folio;
int err;
- trace_erofs_readahead(inode, readahead_index(rac), nrpages, false);
+ trace_erofs_readahead(realinode, readahead_index(rac), nrpages, false);
z_erofs_pcluster_readmore(&f, rac, true);
while ((folio = readahead_folio(rac))) {
folio->private = head;
@@ -1926,8 +1933,8 @@ static void z_erofs_readahead(struct readahead_control *rac)
err = z_erofs_scan_folio(&f, folio, true);
if (err && err != -EINTR)
- erofs_err(inode->i_sb, "readahead error at folio %lu @ nid %llu",
- folio->index, EROFS_I(inode)->nid);
+ erofs_err(realinode->i_sb, "readahead error at folio %lu @ nid %llu",
+ folio->index, EROFS_I(realinode)->nid);
}
z_erofs_pcluster_readmore(&f, rac, false);
z_erofs_pcluster_end(&f);
@@ -1935,6 +1942,9 @@ static void z_erofs_readahead(struct readahead_control *rac)
(void)z_erofs_runqueue(&f, nrpages);
erofs_put_metabuf(&f.map.buf);
erofs_release_pages(&f.pagepool);
+
+ if (need_iput)
+ iput(realinode);
}
const struct address_space_operations z_erofs_aops = {
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v16 10/10] erofs: implement .fadvise for page cache share
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
` (8 preceding siblings ...)
2026-01-22 13:37 ` [PATCH v16 09/10] erofs: support compressed " Hongbo Li
@ 2026-01-22 13:37 ` Hongbo Li
9 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 13:37 UTC (permalink / raw)
To: hsiangkao, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel,
lihongbo22
From: Hongzhen Luo <hongzhen@linux.alibaba.com>
This patch implements the .fadvise interface for page cache share.
Similar to overlayfs, it drops those clean, unused pages through
vfs_fadvise().
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
---
fs/erofs/ishare.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c
index ad53a57dbcbc..ce980320a8b9 100644
--- a/fs/erofs/ishare.c
+++ b/fs/erofs/ishare.c
@@ -151,6 +151,12 @@ static int erofs_ishare_mmap(struct file *file, struct vm_area_struct *vma)
return generic_file_readonly_mmap(file, vma);
}
+static int erofs_ishare_fadvise(struct file *file, loff_t offset,
+ loff_t len, int advice)
+{
+ return vfs_fadvise(file->private_data, offset, len, advice);
+}
+
const struct file_operations erofs_ishare_fops = {
.open = erofs_ishare_file_open,
.llseek = generic_file_llseek,
@@ -159,6 +165,7 @@ const struct file_operations erofs_ishare_fops = {
.release = erofs_ishare_file_release,
.get_unmapped_area = thp_get_unmapped_area,
.splice_read = filemap_splice_read,
+ .fadvise = erofs_ishare_fadvise,
};
struct inode *erofs_real_inode(struct inode *inode, bool *need_iput)
--
2.22.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops.
2026-01-22 13:37 ` [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops Hongbo Li
@ 2026-01-22 13:54 ` Gao Xiang
2026-01-23 6:18 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2026-01-22 13:54 UTC (permalink / raw)
To: Hongbo Li, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel
On 2026/1/22 21:37, Hongbo Li wrote:
> Add erofs_inode_set_aops helper to set the inode->i_mapping->a_ops,
> and using IS_ENABLED to make it cleaner.
>
> Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
> ---
> fs/erofs/inode.c | 23 +----------------------
> fs/erofs/internal.h | 23 +++++++++++++++++++++++
> 2 files changed, 24 insertions(+), 22 deletions(-)
>
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index bce98c845a18..389632bb46c4 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -235,28 +235,7 @@ static int erofs_fill_inode(struct inode *inode)
> }
>
> mapping_set_large_folios(inode->i_mapping);
> - if (erofs_inode_is_data_compressed(vi->datalayout)) {
> -#ifdef CONFIG_EROFS_FS_ZIP
> - DO_ONCE_LITE_IF(inode->i_blkbits != PAGE_SHIFT,
> - erofs_info, inode->i_sb,
> - "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!");
> - inode->i_mapping->a_ops = &z_erofs_aops;
> -#else
> - err = -EOPNOTSUPP;
> -#endif
> - } else {
> - inode->i_mapping->a_ops = &erofs_aops;
> -#ifdef CONFIG_EROFS_FS_ONDEMAND
> - if (erofs_is_fscache_mode(inode->i_sb))
> - inode->i_mapping->a_ops = &erofs_fscache_access_aops;
> -#endif
> -#ifdef CONFIG_EROFS_FS_BACKED_BY_FILE
> - if (erofs_is_fileio_mode(EROFS_SB(inode->i_sb)))
> - inode->i_mapping->a_ops = &erofs_fileio_aops;
> -#endif
> - }
> -
> - return err;
> + return erofs_inode_set_aops(inode, inode, false);
> }
>
> /*
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index ec79e8b44d3b..8e28c2fa8735 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -455,6 +455,29 @@ static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count)
> return NULL;
> }
>
> +static inline int erofs_inode_set_aops(struct inode *inode,
> + struct inode *realinode, bool no_fscache)
> +{
> + if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) {
> + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP))
> + return -EOPNOTSUPP;
> + DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT,
> + erofs_info, realinode->i_sb,
> + "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!");
> + inode->i_mapping->a_ops = &z_erofs_aops;
Is that available if CONFIG_EROFS_FS_ZIP is undefined?
> + return 0;
> + }
> + inode->i_mapping->a_ops = &erofs_aops;
> + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND)) {
> + if (!no_fscache && erofs_is_fscache_mode(realinode->i_sb))
> + inode->i_mapping->a_ops = &erofs_fscache_access_aops;
> + } else {
I really don't think they are equal, could you just move
the code without any change?
Thanks,
Gao Xiang
> + if (erofs_is_fileio_mode(EROFS_SB(realinode->i_sb)))
> + inode->i_mapping->a_ops = &erofs_fileio_aops;
> + }
> + return 0;
> +}
> +
> int erofs_register_sysfs(struct super_block *sb);
> void erofs_unregister_sysfs(struct super_block *sb);
> int __init erofs_init_sysfs(void);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v16 06/10] erofs: introduce the page cache share feature
2026-01-22 13:37 ` [PATCH v16 06/10] erofs: introduce the page cache share feature Hongbo Li
@ 2026-01-22 14:01 ` Gao Xiang
2026-01-22 15:09 ` Hongbo Li
0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2026-01-22 14:01 UTC (permalink / raw)
To: Hongbo Li, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel
On 2026/1/22 21:37, Hongbo Li wrote:
> From: Hongzhen Luo <hongzhen@linux.alibaba.com>
>
> Currently, reading files with different paths (or names) but the same
> content will consume multiple copies of the page cache, even if the
> content of these page caches is the same. For example, reading
> identical files (e.g., *.so files) from two different minor versions of
> container images will cost multiple copies of the same page cache,
> since different containers have different mount points. Therefore,
> sharing the page cache for files with the same content can save memory.
>
> This introduces the page cache share feature in erofs. It allocate a
> shared inode and use its page cache as shared. Reads for files
> with identical content will ultimately be routed to the page cache of
> the shared inode. In this way, a single page cache satisfies
> multiple read requests for different files with the same contents.
>
> We introduce new mount option `inode_share` to enable the page
> sharing mode during mounting. This option is used in conjunction
> with `domain_id` to share the page cache within the same trusted
> domain.
>
> Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
> Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
> ---
> Documentation/filesystems/erofs.rst | 5 +
> fs/erofs/Makefile | 1 +
> fs/erofs/inode.c | 1 -
> fs/erofs/internal.h | 31 ++++++
> fs/erofs/ishare.c | 167 ++++++++++++++++++++++++++++
> fs/erofs/super.c | 62 ++++++++++-
> fs/erofs/xattr.c | 34 ++++++
> fs/erofs/xattr.h | 3 +
> 8 files changed, 301 insertions(+), 3 deletions(-)
> create mode 100644 fs/erofs/ishare.c
>
> diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
> index 40dbf3b6a35f..bfef8e87f299 100644
> --- a/Documentation/filesystems/erofs.rst
> +++ b/Documentation/filesystems/erofs.rst
> @@ -129,7 +129,12 @@ fsid=%s Specify a filesystem image ID for Fscache back-end.
> domain_id=%s Specify a trusted domain ID for fscache mode so that
> different images with the same blobs, identified by blob IDs,
> can share storage within the same trusted domain.
> + Also used for different filesystems with inode page sharing
> + enabled to share page cache within the trusted domain.
> fsoffset=%llu Specify block-aligned filesystem offset for the primary device.
> +inode_share Enable inode page sharing for this filesystem. Inodes with
> + identical content within the same domain ID can share the
> + page cache.
> =================== =========================================================
>
> Sysfs Entries
> diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
> index 549abc424763..a80e1762b607 100644
> --- a/fs/erofs/Makefile
> +++ b/fs/erofs/Makefile
> @@ -10,3 +10,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o
> erofs-$(CONFIG_EROFS_FS_ZIP_ACCEL) += decompressor_crypto.o
> erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
> erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
> +erofs-$(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) += ishare.o
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index 389632bb46c4..202cbbb4eada 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -203,7 +203,6 @@ static int erofs_read_inode(struct inode *inode)
>
> static int erofs_fill_inode(struct inode *inode)
> {
> - struct erofs_inode *vi = EROFS_I(inode);
Why this line is in this patch other than
"erofs: add erofs_inode_set_aops helper to set the aops[.]"
And there is an unneeded dot at the end of the subject.
Could you check the patches carefully before sending
out the next version?
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v16 06/10] erofs: introduce the page cache share feature
2026-01-22 14:01 ` Gao Xiang
@ 2026-01-22 15:09 ` Hongbo Li
0 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-22 15:09 UTC (permalink / raw)
To: Gao Xiang, chao, brauner
Cc: hch, djwong, amir73il, linux-fsdevel, linux-erofs, linux-kernel
On 2026/1/22 22:01, Gao Xiang wrote:
>
>
> On 2026/1/22 21:37, Hongbo Li wrote:
>> From: Hongzhen Luo <hongzhen@linux.alibaba.com>
>>
>> Currently, reading files with different paths (or names) but the same
>> content will consume multiple copies of the page cache, even if the
>> content of these page caches is the same. For example, reading
>> identical files (e.g., *.so files) from two different minor versions of
>> container images will cost multiple copies of the same page cache,
>> since different containers have different mount points. Therefore,
>> sharing the page cache for files with the same content can save memory.
>>
>> This introduces the page cache share feature in erofs. It allocate a
>> shared inode and use its page cache as shared. Reads for files
>> with identical content will ultimately be routed to the page cache of
>> the shared inode. In this way, a single page cache satisfies
>> multiple read requests for different files with the same contents.
>>
>> We introduce new mount option `inode_share` to enable the page
>> sharing mode during mounting. This option is used in conjunction
>> with `domain_id` to share the page cache within the same trusted
>> domain.
>>
>> Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
>> Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
>> ---
>> Documentation/filesystems/erofs.rst | 5 +
>> fs/erofs/Makefile | 1 +
>> fs/erofs/inode.c | 1 -
>> fs/erofs/internal.h | 31 ++++++
>> fs/erofs/ishare.c | 167 ++++++++++++++++++++++++++++
>> fs/erofs/super.c | 62 ++++++++++-
>> fs/erofs/xattr.c | 34 ++++++
>> fs/erofs/xattr.h | 3 +
>> 8 files changed, 301 insertions(+), 3 deletions(-)
>> create mode 100644 fs/erofs/ishare.c
>>
>> diff --git a/Documentation/filesystems/erofs.rst
>> b/Documentation/filesystems/erofs.rst
>> index 40dbf3b6a35f..bfef8e87f299 100644
>> --- a/Documentation/filesystems/erofs.rst
>> +++ b/Documentation/filesystems/erofs.rst
>> @@ -129,7 +129,12 @@ fsid=%s Specify a filesystem image
>> ID for Fscache back-end.
>> domain_id=%s Specify a trusted domain ID for fscache mode
>> so that
>> different images with the same blobs,
>> identified by blob IDs,
>> can share storage within the same trusted
>> domain.
>> + Also used for different filesystems with inode
>> page sharing
>> + enabled to share page cache within the trusted
>> domain.
>> fsoffset=%llu Specify block-aligned filesystem offset for
>> the primary device.
>> +inode_share Enable inode page sharing for this
>> filesystem. Inodes with
>> + identical content within the same domain ID
>> can share the
>> + page cache.
>> ===================
>> =========================================================
>> Sysfs Entries
>> diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
>> index 549abc424763..a80e1762b607 100644
>> --- a/fs/erofs/Makefile
>> +++ b/fs/erofs/Makefile
>> @@ -10,3 +10,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) +=
>> decompressor_zstd.o
>> erofs-$(CONFIG_EROFS_FS_ZIP_ACCEL) += decompressor_crypto.o
>> erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o
>> erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
>> +erofs-$(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) += ishare.o
>> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
>> index 389632bb46c4..202cbbb4eada 100644
>> --- a/fs/erofs/inode.c
>> +++ b/fs/erofs/inode.c
>> @@ -203,7 +203,6 @@ static int erofs_read_inode(struct inode *inode)
>> static int erofs_fill_inode(struct inode *inode)
>> {
>> - struct erofs_inode *vi = EROFS_I(inode);
>
> Why this line is in this patch other than
> "erofs: add erofs_inode_set_aops helper to set the aops[.]"
>
> And there is an unneeded dot at the end of the subject.
>
> Could you check the patches carefully before sending
> out the next version?
I am very sorry for making such stupid mistake. :(
Thanks,
Hongbo
>
> Thanks,
> Gao Xiang
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops.
2026-01-22 13:54 ` Gao Xiang
@ 2026-01-23 6:18 ` Christoph Hellwig
2026-01-23 7:42 ` Gao Xiang
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2026-01-23 6:18 UTC (permalink / raw)
To: Gao Xiang
Cc: Hongbo Li, chao, brauner, hch, djwong, amir73il, linux-fsdevel,
linux-erofs, linux-kernel
On Thu, Jan 22, 2026 at 09:54:15PM +0800, Gao Xiang wrote:
>> @@ -455,6 +455,29 @@ static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count)
>> return NULL;
>> }
>> +static inline int erofs_inode_set_aops(struct inode *inode,
>> + struct inode *realinode, bool no_fscache)
>> +{
>> + if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) {
>> + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP))
>> + return -EOPNOTSUPP;
>> + DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT,
>> + erofs_info, realinode->i_sb,
>> + "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!");
>> + inode->i_mapping->a_ops = &z_erofs_aops;
>
> Is that available if CONFIG_EROFS_FS_ZIP is undefined?
z_erofs_aops is declared unconditionally, and the IS_ENABLED above
ensures the compiler will never generate a reference to it.
So this is fine, and a very usualy trick to make the code more
readable.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops.
2026-01-23 6:18 ` Christoph Hellwig
@ 2026-01-23 7:42 ` Gao Xiang
2026-01-23 8:21 ` Hongbo Li
0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2026-01-23 7:42 UTC (permalink / raw)
To: Christoph Hellwig, Hongbo Li
Cc: chao, brauner, djwong, amir73il, linux-fsdevel, linux-erofs,
linux-kernel
On 2026/1/23 14:18, Christoph Hellwig wrote:
> On Thu, Jan 22, 2026 at 09:54:15PM +0800, Gao Xiang wrote:
>>> @@ -455,6 +455,29 @@ static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count)
>>> return NULL;
>>> }
>>> +static inline int erofs_inode_set_aops(struct inode *inode,
>>> + struct inode *realinode, bool no_fscache)
>>> +{
>>> + if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) {
>>> + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP))
>>> + return -EOPNOTSUPP;
>>> + DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT,
>>> + erofs_info, realinode->i_sb,
>>> + "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!");
>>> + inode->i_mapping->a_ops = &z_erofs_aops;
>>
>> Is that available if CONFIG_EROFS_FS_ZIP is undefined?
>
> z_erofs_aops is declared unconditionally, and the IS_ENABLED above
> ensures the compiler will never generate a reference to it.
>
> So this is fine, and a very usualy trick to make the code more
> readable.
Yeah, I get your point, that is really helpful and I haven't
used that trick.
The other problem was the else part is incorrect, Hongbo,
how about applying the following code and resend the next
version, I will apply all patches later:
static inline int erofs_inode_set_aops(struct inode *inode,
struct inode *realinode, bool no_fscache)
{
if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) {
if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP))
return -EOPNOTSUPP;
DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT,
erofs_info, realinode->i_sb,
"EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!");
inode->i_mapping->a_ops = &z_erofs_aops;
return 0;
}
inode->i_mapping->a_ops = &erofs_aops;
if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !no_fscache &&
erofs_is_fscache_mode(realinode->i_sb))
inode->i_mapping->a_ops = &erofs_fscache_access_aops;
if (IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) &&
erofs_is_fileio_mode(EROFS_SB(realinode->i_sb)))
inode->i_mapping->a_ops = &erofs_fileio_aops;
return 0;
}
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops.
2026-01-23 7:42 ` Gao Xiang
@ 2026-01-23 8:21 ` Hongbo Li
0 siblings, 0 replies; 17+ messages in thread
From: Hongbo Li @ 2026-01-23 8:21 UTC (permalink / raw)
To: Gao Xiang, Christoph Hellwig
Cc: chao, brauner, djwong, amir73il, linux-fsdevel, linux-erofs,
linux-kernel
Hi Xiang and Christoph,
On 2026/1/23 15:42, Gao Xiang wrote:
>
>
> On 2026/1/23 14:18, Christoph Hellwig wrote:
>> On Thu, Jan 22, 2026 at 09:54:15PM +0800, Gao Xiang wrote:
>>>> @@ -455,6 +455,29 @@ static inline void *erofs_vm_map_ram(struct
>>>> page **pages, unsigned int count)
>>>> return NULL;
>>>> }
>>>> +static inline int erofs_inode_set_aops(struct inode *inode,
>>>> + struct inode *realinode, bool no_fscache)
>>>> +{
>>>> + if
>>>> (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) {
>>>> + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP))
>>>> + return -EOPNOTSUPP;
>>>> + DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT,
>>>> + erofs_info, realinode->i_sb,
>>>> + "EXPERIMENTAL EROFS subpage compressed block support
>>>> in use. Use at your own risk!");
>>>> + inode->i_mapping->a_ops = &z_erofs_aops;
>>>
>>> Is that available if CONFIG_EROFS_FS_ZIP is undefined?
>>
>> z_erofs_aops is declared unconditionally, and the IS_ENABLED above
>> ensures the compiler will never generate a reference to it.
>>
>> So this is fine, and a very usualy trick to make the code more
>> readable.
>
> Yeah, I get your point, that is really helpful and I haven't
> used that trick.
>
> The other problem was the else part is incorrect, Hongbo,
> how about applying the following code and resend the next
> version, I will apply all patches later:
>
Thanks you very much for your careful review and help. It was indeed my
own mistake (I have been making errors too easily lately which taught me
a lot...).
I have updated the new version in:
https://lore.kernel.org/all/20260123075239.664330-1-lihongbo22@huawei.com/
Thanks,
Hongbo
> static inline int erofs_inode_set_aops(struct inode *inode,
> struct inode *realinode, bool
> no_fscache)
> {
> if
> (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) {
> if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP))
> return -EOPNOTSUPP;
> DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT,
> erofs_info, realinode->i_sb,
> "EXPERIMENTAL EROFS subpage compressed block
> support in use. Use at your own risk!");
> inode->i_mapping->a_ops = &z_erofs_aops;
> return 0;
> }
> inode->i_mapping->a_ops = &erofs_aops;
> if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !no_fscache &&
> erofs_is_fscache_mode(realinode->i_sb))
> inode->i_mapping->a_ops = &erofs_fscache_access_aops;
> if (IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) &&
> erofs_is_fileio_mode(EROFS_SB(realinode->i_sb)))
> inode->i_mapping->a_ops = &erofs_fileio_aops;
> return 0;
> }
>
> Thanks,
> Gao Xiang
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-01-23 8:21 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 13:37 [PATCH v16 00/10] erofs: Introduce page cache sharing feature Hongbo Li
2026-01-22 13:37 ` [PATCH v16 01/10] fs: Export alloc_empty_backing_file Hongbo Li
2026-01-22 13:37 ` [PATCH v16 02/10] erofs: decouple `struct erofs_anon_fs_type` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 03/10] erofs: support user-defined fingerprint name Hongbo Li
2026-01-22 13:37 ` [PATCH v16 04/10] erofs: add erofs_inode_set_aops helper to set the aops Hongbo Li
2026-01-22 13:54 ` Gao Xiang
2026-01-23 6:18 ` Christoph Hellwig
2026-01-23 7:42 ` Gao Xiang
2026-01-23 8:21 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 05/10] erofs: using domain_id in the safer way Hongbo Li
2026-01-22 13:37 ` [PATCH v16 06/10] erofs: introduce the page cache share feature Hongbo Li
2026-01-22 14:01 ` Gao Xiang
2026-01-22 15:09 ` Hongbo Li
2026-01-22 13:37 ` [PATCH v16 07/10] erofs: pass inode to trace_erofs_read_folio Hongbo Li
2026-01-22 13:37 ` [PATCH v16 08/10] erofs: support unencoded inodes for page cache share Hongbo Li
2026-01-22 13:37 ` [PATCH v16 09/10] erofs: support compressed " Hongbo Li
2026-01-22 13:37 ` [PATCH v16 10/10] erofs: implement .fadvise " Hongbo Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox