* [PATCH] vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
@ 2025-05-11 8:36 Yafang Shao
2025-05-13 10:14 ` Jan Kara
2025-05-15 9:13 ` Christian Brauner
0 siblings, 2 replies; 3+ messages in thread
From: Yafang Shao @ 2025-05-11 8:36 UTC (permalink / raw)
To: viro, brauner, jack; +Cc: linux-fsdevel, Yafang Shao
On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup
involves scanning all files and caching their metadata (including dentries
and inodes) in memory. Each HDD contains approximately 2 million files,
resulting in a total of ~20 million cached dentries after initialization.
To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite
this configuration, memory pressure conditions can still trigger
reclamation of up to 50% of cached dentries, reducing the cache from 20
million to approximately 10 million entries. During the subsequent cache
rebuild period, any HDFS datanode restart operation incurs substantial
latency penalties until full cache recovery completes.
To maintain service stability, we need to preserve more dentries during
memory reclamation. The current minimum reclaim ratio (1/100 of total
dentries) remains too aggressive for our workload. This patch introduces
vfs_cache_pressure_denom for more granular cache pressure control. The
configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000]
effectively maintains the full 20 million dentry cache under memory
pressure, preventing datanode restart performance degradation.
Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
Documentation/admin-guide/sysctl/vm.rst | 32 ++++++++++++++++---------
fs/dcache.c | 11 ++++++++-
2 files changed, 31 insertions(+), 12 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 8290177b4f75..d385985b305f 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -75,6 +75,7 @@ Currently, these files are in /proc/sys/vm:
- unprivileged_userfaultfd
- user_reserve_kbytes
- vfs_cache_pressure
+- vfs_cache_pressure_denom
- watermark_boost_factor
- watermark_scale_factor
- zone_reclaim_mode
@@ -1017,19 +1018,28 @@ vfs_cache_pressure
This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.
-At the default value of vfs_cache_pressure=100 the kernel will attempt to
-reclaim dentries and inodes at a "fair" rate with respect to pagecache and
-swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
-to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
-never reclaim dentries and inodes due to memory pressure and this can easily
-lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
-causes the kernel to prefer to reclaim dentries and inodes.
+At the default value of vfs_cache_pressure=vfs_cache_pressure_denom the kernel
+will attempt to reclaim dentries and inodes at a "fair" rate with respect to
+pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the
+kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0,
+the kernel will never reclaim dentries and inodes due to memory pressure and
+this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure
+beyond vfs_cache_pressure_denom causes the kernel to prefer to reclaim dentries
+and inodes.
-Increasing vfs_cache_pressure significantly beyond 100 may have negative
-performance impact. Reclaim code needs to take various locks to find freeable
-directory and inode objects. With vfs_cache_pressure=1000, it will look for
-ten times more freeable objects than there are.
+Increasing vfs_cache_pressure significantly beyond vfs_cache_pressure_denom may
+have negative performance impact. Reclaim code needs to take various locks to
+find freeable directory and inode objects. When vfs_cache_pressure equals
+(10 * vfs_cache_pressure_denom), it will look for ten times more freeable
+objects than there are.
+Note: This setting should always be used together with vfs_cache_pressure_denom.
+
+vfs_cache_pressure_denom
+========================
+
+Defaults to 100 (minimum allowed value). Requires corresponding
+vfs_cache_pressure setting to take effect.
watermark_boost_factor
======================
diff --git a/fs/dcache.c b/fs/dcache.c
index bd5aa136153a..ed46818c151c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -74,10 +74,11 @@
* arbitrary, since it's serialized on rename_lock
*/
static int sysctl_vfs_cache_pressure __read_mostly = 100;
+static int sysctl_vfs_cache_pressure_denom __read_mostly = 100;
unsigned long vfs_pressure_ratio(unsigned long val)
{
- return mult_frac(val, sysctl_vfs_cache_pressure, 100);
+ return mult_frac(val, sysctl_vfs_cache_pressure, sysctl_vfs_cache_pressure_denom);
}
EXPORT_SYMBOL_GPL(vfs_pressure_ratio);
@@ -225,6 +226,14 @@ static const struct ctl_table vm_dcache_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
},
+ {
+ .procname = "vfs_cache_pressure_denom",
+ .data = &sysctl_vfs_cache_pressure_denom,
+ .maxlen = sizeof(sysctl_vfs_cache_pressure_denom),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE_HUNDRED,
+ },
};
static int __init init_fs_dcache_sysctls(void)
--
2.43.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
2025-05-11 8:36 [PATCH] vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations Yafang Shao
@ 2025-05-13 10:14 ` Jan Kara
2025-05-15 9:13 ` Christian Brauner
1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2025-05-13 10:14 UTC (permalink / raw)
To: Yafang Shao; +Cc: viro, brauner, jack, linux-fsdevel
On Sun 11-05-25 16:36:24, Yafang Shao wrote:
> On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup
> involves scanning all files and caching their metadata (including dentries
> and inodes) in memory. Each HDD contains approximately 2 million files,
> resulting in a total of ~20 million cached dentries after initialization.
>
> To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite
> this configuration, memory pressure conditions can still trigger
> reclamation of up to 50% of cached dentries, reducing the cache from 20
> million to approximately 10 million entries. During the subsequent cache
> rebuild period, any HDFS datanode restart operation incurs substantial
> latency penalties until full cache recovery completes.
>
> To maintain service stability, we need to preserve more dentries during
> memory reclamation. The current minimum reclaim ratio (1/100 of total
> dentries) remains too aggressive for our workload. This patch introduces
> vfs_cache_pressure_denom for more granular cache pressure control. The
> configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000]
> effectively maintains the full 20 million dentry cache under memory
> pressure, preventing datanode restart performance degradation.
>
> Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0]
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Makes sense. The patch looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> Documentation/admin-guide/sysctl/vm.rst | 32 ++++++++++++++++---------
> fs/dcache.c | 11 ++++++++-
> 2 files changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 8290177b4f75..d385985b305f 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -75,6 +75,7 @@ Currently, these files are in /proc/sys/vm:
> - unprivileged_userfaultfd
> - user_reserve_kbytes
> - vfs_cache_pressure
> +- vfs_cache_pressure_denom
> - watermark_boost_factor
> - watermark_scale_factor
> - zone_reclaim_mode
> @@ -1017,19 +1018,28 @@ vfs_cache_pressure
> This percentage value controls the tendency of the kernel to reclaim
> the memory which is used for caching of directory and inode objects.
>
> -At the default value of vfs_cache_pressure=100 the kernel will attempt to
> -reclaim dentries and inodes at a "fair" rate with respect to pagecache and
> -swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
> -to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
> -never reclaim dentries and inodes due to memory pressure and this can easily
> -lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
> -causes the kernel to prefer to reclaim dentries and inodes.
> +At the default value of vfs_cache_pressure=vfs_cache_pressure_denom the kernel
> +will attempt to reclaim dentries and inodes at a "fair" rate with respect to
> +pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the
> +kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0,
> +the kernel will never reclaim dentries and inodes due to memory pressure and
> +this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure
> +beyond vfs_cache_pressure_denom causes the kernel to prefer to reclaim dentries
> +and inodes.
>
> -Increasing vfs_cache_pressure significantly beyond 100 may have negative
> -performance impact. Reclaim code needs to take various locks to find freeable
> -directory and inode objects. With vfs_cache_pressure=1000, it will look for
> -ten times more freeable objects than there are.
> +Increasing vfs_cache_pressure significantly beyond vfs_cache_pressure_denom may
> +have negative performance impact. Reclaim code needs to take various locks to
> +find freeable directory and inode objects. When vfs_cache_pressure equals
> +(10 * vfs_cache_pressure_denom), it will look for ten times more freeable
> +objects than there are.
>
> +Note: This setting should always be used together with vfs_cache_pressure_denom.
> +
> +vfs_cache_pressure_denom
> +========================
> +
> +Defaults to 100 (minimum allowed value). Requires corresponding
> +vfs_cache_pressure setting to take effect.
>
> watermark_boost_factor
> ======================
> diff --git a/fs/dcache.c b/fs/dcache.c
> index bd5aa136153a..ed46818c151c 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -74,10 +74,11 @@
> * arbitrary, since it's serialized on rename_lock
> */
> static int sysctl_vfs_cache_pressure __read_mostly = 100;
> +static int sysctl_vfs_cache_pressure_denom __read_mostly = 100;
>
> unsigned long vfs_pressure_ratio(unsigned long val)
> {
> - return mult_frac(val, sysctl_vfs_cache_pressure, 100);
> + return mult_frac(val, sysctl_vfs_cache_pressure, sysctl_vfs_cache_pressure_denom);
> }
> EXPORT_SYMBOL_GPL(vfs_pressure_ratio);
>
> @@ -225,6 +226,14 @@ static const struct ctl_table vm_dcache_sysctls[] = {
> .proc_handler = proc_dointvec_minmax,
> .extra1 = SYSCTL_ZERO,
> },
> + {
> + .procname = "vfs_cache_pressure_denom",
> + .data = &sysctl_vfs_cache_pressure_denom,
> + .maxlen = sizeof(sysctl_vfs_cache_pressure_denom),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ONE_HUNDRED,
> + },
> };
>
> static int __init init_fs_dcache_sysctls(void)
> --
> 2.43.5
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
2025-05-11 8:36 [PATCH] vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations Yafang Shao
2025-05-13 10:14 ` Jan Kara
@ 2025-05-15 9:13 ` Christian Brauner
1 sibling, 0 replies; 3+ messages in thread
From: Christian Brauner @ 2025-05-15 9:13 UTC (permalink / raw)
To: Yafang Shao; +Cc: Christian Brauner, linux-fsdevel, viro, jack
On Sun, 11 May 2025 16:36:24 +0800, Yafang Shao wrote:
> On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup
> involves scanning all files and caching their metadata (including dentries
> and inodes) in memory. Each HDD contains approximately 2 million files,
> resulting in a total of ~20 million cached dentries after initialization.
>
> To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite
> this configuration, memory pressure conditions can still trigger
> reclamation of up to 50% of cached dentries, reducing the cache from 20
> million to approximately 10 million entries. During the subsequent cache
> rebuild period, any HDFS datanode restart operation incurs substantial
> latency penalties until full cache recovery completes.
>
> [...]
Applied to the vfs-6.16.misc branch of the vfs/vfs.git tree.
Patches in the vfs-6.16.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.16.misc
[1/1] vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
https://git.kernel.org/vfs/vfs/c/e7b9cea718ee
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-05-15 9:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-11 8:36 [PATCH] vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations Yafang Shao
2025-05-13 10:14 ` Jan Kara
2025-05-15 9:13 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).