* [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper
@ 2026-05-14 15:13 Horst Birthelmer
2026-05-15 15:09 ` kernel test robot
2026-05-15 15:09 ` kernel test robot
0 siblings, 2 replies; 3+ messages in thread
From: Horst Birthelmer @ 2026-05-14 15:13 UTC (permalink / raw)
To: Miklos Szeredi, Jonathan Corbet, Shuah Khan, Alexander Viro,
Christian Brauner, Jan Kara
Cc: linux-doc, linux-kernel, linux-fsdevel, Horst Birthelmer
From: Horst Birthelmer <hbirthelmer@ddn.com>
The dcache only shrinks under memory pressure, which is rarely reached
on machines with ample RAM, so cached negative dentries can accumulate
without bound. Give administrators a soft cap they can set,
and a background worker that prefers negative dentries when reclaiming.
Two new sysctls under /proc/sys/fs/:
dentry-limit -- soft cap on nr_dentry. 0 (default)
disables the feature; behaviour is then
identical to before.
dentry-limit-interval-ms -- pacing for the worker while still over
the cap. Default 1000, minimum 1.
When the cap is exceeded, a delayed_work runs in two phases:
1. iterate_supers() draining only negative dentries from every LRU.
Positive entries are rotated past so the walk makes progress.
DCACHE_REFERENCED is ignored here on purpose -- an admin-imposed
cap should evict even hot negatives before any positive entry.
2. If still over the cap, iterate_supers() again with the same
isolate callback the memory-pressure shrinker uses.
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
---
There was a discussion at LSFMM about servers with too many cached
negative dentries.
That gave me the idea to keep the dentries in general limited
if the system administrator needs it to.
This is somewhat related to [1] where it would address the same
symptoms but in a more unobtrusive way, by just garbage collecting
the negative and then the unused cache entries.
The other effect I have seen regarding this is that FUSE
will not forget inodes (no FORGET call to the FUSE server)
even after the latest reference has been closed until much later.
In a FUSE server that mirrors the kernel cached inodes in user space
because it has to keep a lot of private data for every node
this puts an unnecessarry memory strain on that userspace entity
especially if the memory is limited for its cgroup.
[1]: https://lore.kernel.org/linux-fsdevel/20260331012925.74840-1-raven@themaw.net/
---
Documentation/admin-guide/sysctl/fs.rst | 28 +++++
fs/dcache.c | 197 ++++++++++++++++++++++++++++++++
2 files changed, 225 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 9b7f65c3efd8..0229aea45d85 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -38,6 +38,34 @@ requests. ``aio-max-nr`` allows you to change the maximum value
``aio-max-nr`` does not result in the
pre-allocation or re-sizing of any kernel data structures.
+dentry-limit
+------------
+
+Soft cap on the total number of dentries allocated system-wide (i.e. on
+``nr_dentry`` from ``dentry-state``). A value of ``0`` (the default)
+disables the feature and the dcache grows or shrinks only under memory
+pressure as before.
+
+When set to a non-zero value, a background worker is woken whenever
+the live dentry count exceeds the limit. The worker walks every
+superblock's LRU and prefers to evict negative dentries first; if it
+cannot get back under the limit using negative entries alone it falls
+back to the same LRU policy used by the memory-pressure shrinker.
+
+The limit is *soft*: allocations never fail because of it, and brief
+overshoots while the worker catches up are expected. Set the cap a
+comfortable margin above your steady-state working set.
+
+dentry-limit-interval-ms
+------------------------
+
+How often, in milliseconds, the ``dentry-limit`` worker re-runs while
+``nr_dentry`` is still above the cap. Defaults to ``1000`` (one
+second); the minimum accepted value is ``1``. Smaller values trim the
+cache more aggressively at the cost of more CPU spent walking LRUs;
+larger values let temporary spikes ride out before any work is done.
+Has no effect when ``dentry-limit`` is ``0``.
+
dentry-negative
----------------------------
diff --git a/fs/dcache.c b/fs/dcache.c
index 2c61aeea41f4..4959d2c011c0 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -144,6 +144,19 @@ static DEFINE_PER_CPU(long, nr_dentry_unused);
static DEFINE_PER_CPU(long, nr_dentry_negative);
static int dentry_negative_policy;
+/*
+ * Soft cap on the total number of dentries. When non-zero and exceeded,
+ * a background worker prunes unused dentries (preferring negative ones)
+ * until we are back under the limit. Zero (the default) disables the
+ * feature entirely; the fast path in __d_alloc() only pays the cost of
+ * a READ_ONCE and a branch in that case.
+ */
+static unsigned long sysctl_dentry_limit __read_mostly;
+static unsigned int sysctl_dentry_limit_interval_ms __read_mostly = 1000;
+static unsigned long dentry_limit_last_kick;
+
+static void dentry_limit_kick(void);
+
#if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
/* Statistics gathering. */
static struct dentry_stat_t dentry_stat = {
@@ -199,6 +212,20 @@ static int proc_nr_dentry(const struct ctl_table *table, int write, void *buffer
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
}
+/*
+ * Writing fs.dentry-limit should give prompt feedback to admins
+ * lowering the cap, so kick the worker on every successful write.
+ */
+static int proc_dentry_limit(const struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
+
+ if (write && !ret)
+ dentry_limit_kick();
+ return ret;
+}
+
static const struct ctl_table fs_dcache_sysctls[] = {
{
.procname = "dentry-state",
@@ -207,6 +234,21 @@ static const struct ctl_table fs_dcache_sysctls[] = {
.mode = 0444,
.proc_handler = proc_nr_dentry,
},
+ {
+ .procname = "dentry-limit",
+ .data = &sysctl_dentry_limit,
+ .maxlen = sizeof(sysctl_dentry_limit),
+ .mode = 0644,
+ .proc_handler = proc_dentry_limit,
+ },
+ {
+ .procname = "dentry-limit-interval-ms",
+ .data = &sysctl_dentry_limit_interval_ms,
+ .maxlen = sizeof(sysctl_dentry_limit_interval_ms),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ },
{
.procname = "dentry-negative",
.data = &dentry_negative_policy,
@@ -1325,6 +1367,160 @@ static enum lru_status dentry_lru_isolate_shrink(struct list_head *item,
return LRU_REMOVED;
}
+#define DENTRY_LIMIT_BATCH 1024UL
+
+static void dentry_limit_worker_fn(struct work_struct *work);
+static DECLARE_DELAYED_WORK(dentry_limit_work, dentry_limit_worker_fn);
+
+/*
+ * Variant of dentry_lru_isolate() that only frees negative dentries.
+ * DCACHE_REFERENCED is intentionally not honoured here: the whole point
+ * of an admin-imposed cap on negatives is that even frequently-looked-up
+ * negative entries should be evicted before any positive dentry.
+ * Positive entries are rotated to the tail so the walk continues to
+ * make progress without disturbing their LRU position.
+ */
+static enum lru_status dentry_lru_isolate_negative(struct list_head *item,
+ struct list_lru_one *lru, void *arg)
+{
+ struct list_head *freeable = arg;
+ struct dentry *dentry = container_of(item, struct dentry, d_lru);
+
+ if (!spin_trylock(&dentry->d_lock))
+ return LRU_SKIP;
+
+ /* Same handling as dentry_lru_isolate() for in-use entries. */
+ if (dentry->d_lockref.count) {
+ d_lru_isolate(lru, dentry);
+ spin_unlock(&dentry->d_lock);
+ return LRU_REMOVED;
+ }
+
+ if (!d_is_negative(dentry)) {
+ spin_unlock(&dentry->d_lock);
+ return LRU_ROTATE;
+ }
+
+ d_lru_shrink_move(lru, dentry, freeable);
+ spin_unlock(&dentry->d_lock);
+ return LRU_REMOVED;
+}
+
+struct dentry_limit_ctx {
+ long over; /* remaining dentries to evict */
+ list_lru_walk_cb isolate;
+};
+
+static void dentry_limit_prune_sb(struct super_block *sb, void *arg)
+{
+ struct dentry_limit_ctx *ctx = arg;
+ unsigned long walked = 0;
+ unsigned long budget;
+
+ if (ctx->over <= 0)
+ return;
+
+ /*
+ * Walk up to one full pass of this superblock's LRU, in
+ * DENTRY_LIMIT_BATCH-sized chunks. The loop matters mainly for
+ * phase 1: dentry_lru_isolate_negative() returns LRU_ROTATE for
+ * positive dentries, which still counts against list_lru_walk()'s
+ * nr_to_walk. A single batch can therefore finish having freed
+ * nothing when positives crowd the head of the LRU, and without
+ * the inner loop the worker would have to wait a full
+ * dentry-limit-interval-ms before retrying never reaching the
+ * negatives buried behind a long run of positives.
+ *
+ * The budget is snapshot at entry so a filesystem allocating
+ * dentries faster than we drain them can't keep us spinning here
+ * forever; freshly added dentries are picked up on the next
+ * worker invocation.
+ *
+ * Phase 2 normally exits much sooner: its isolate callback frees
+ * any non-referenced dentry, so ctx->over typically hits zero
+ * inside the first batch. The worst-case over-eviction is one
+ * batch past the cap, which is within the soft semantics of
+ * fs.dentry-limit.
+ */
+ budget = list_lru_count(&sb->s_dentry_lru);
+
+ while (ctx->over > 0 && walked < budget) {
+ LIST_HEAD(dispose);
+ unsigned long nr;
+ long freed;
+
+ nr = min(DENTRY_LIMIT_BATCH, budget - walked);
+ freed = list_lru_walk(&sb->s_dentry_lru, ctx->isolate,
+ &dispose, nr);
+ shrink_dentry_list(&dispose);
+
+ ctx->over -= freed;
+ walked += nr;
+
+ cond_resched();
+ }
+}
+
+static void dentry_limit_worker_fn(struct work_struct *work)
+{
+ struct dentry_limit_ctx ctx;
+ unsigned long limit = READ_ONCE(sysctl_dentry_limit);
+ unsigned int ms;
+ long nr;
+
+ if (!limit)
+ return;
+
+ nr = get_nr_dentry();
+ if (nr <= (long)limit)
+ return;
+
+ ctx.over = nr - (long)limit;
+
+ /* Phase 1: drain negative dentries across every superblock. */
+ ctx.isolate = dentry_lru_isolate_negative;
+ iterate_supers(dentry_limit_prune_sb, &ctx);
+
+ /* Phase 2: still over? Apply the ordinary LRU policy. */
+ if (ctx.over > 0) {
+ ctx.isolate = dentry_lru_isolate;
+ iterate_supers(dentry_limit_prune_sb, &ctx);
+ }
+
+ /*
+ * Re-arm while still above the limit. Re-read the sysctls in
+ * case the admin raised the cap or disabled the feature during
+ * the walk.
+ */
+ limit = READ_ONCE(sysctl_dentry_limit);
+ if (!limit || get_nr_dentry() <= (long)limit)
+ return;
+
+ ms = READ_ONCE(sysctl_dentry_limit_interval_ms);
+ queue_delayed_work(system_unbound_wq, &dentry_limit_work,
+ msecs_to_jiffies(ms));
+}
+
+static void dentry_limit_kick(void)
+{
+ unsigned long limit = READ_ONCE(sysctl_dentry_limit);
+ unsigned long now;
+
+ if (!limit)
+ return;
+ if (delayed_work_pending(&dentry_limit_work))
+ return;
+
+ now = jiffies;
+ if (time_before(now, READ_ONCE(dentry_limit_last_kick) + HZ / 10))
+ return;
+ WRITE_ONCE(dentry_limit_last_kick, now);
+
+ if (get_nr_dentry() <= (long)limit)
+ return;
+
+ queue_delayed_work(system_unbound_wq, &dentry_limit_work, 0);
+}
/**
* shrink_dcache_sb - shrink dcache for a superblock
@@ -1868,6 +2064,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
}
this_cpu_inc(nr_dentry);
+ dentry_limit_kick();
return dentry;
}
---
base-commit: 5d6919055dec134de3c40167a490f33c74c12581
change-id: 20260513-limit-dentries-cache-63685729672b
Best regards,
--
Horst Birthelmer <hbirthelmer@ddn.com>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper
2026-05-14 15:13 [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper Horst Birthelmer
@ 2026-05-15 15:09 ` kernel test robot
2026-05-15 15:09 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2026-05-15 15:09 UTC (permalink / raw)
To: Horst Birthelmer, Miklos Szeredi, Jonathan Corbet, Shuah Khan,
Alexander Viro, Christian Brauner, Jan Kara
Cc: oe-kbuild-all, linux-doc, linux-kernel, linux-fsdevel,
Horst Birthelmer
Hi Horst,
kernel test robot noticed the following build errors:
[auto build test ERROR on 5d6919055dec134de3c40167a490f33c74c12581]
url: https://github.com/intel-lab-lkp/linux/commits/Horst-Birthelmer/dcache-add-fs-dentry-limit-sysctl-with-negative-first-reaper/20260515-154600
base: 5d6919055dec134de3c40167a490f33c74c12581
patch link: https://lore.kernel.org/r/20260514-limit-dentries-cache-v1-1-431b9eb0c530%40ddn.com
patch subject: [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper
config: openrisc-randconfig-r073-20260515 (https://download.01.org/0day-ci/archive/20260515/202605152333.0pOd2zJR-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 10.5.0
smatch: v0.5.0-9185-gbcc58b9c
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260515/202605152333.0pOd2zJR-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605152333.0pOd2zJR-lkp@intel.com/
All errors (new ones prefixed by >>):
fs/dcache.c: In function 'dentry_limit_worker_fn':
>> fs/dcache.c:1474:7: error: implicit declaration of function 'get_nr_dentry'; did you mean 'retain_dentry'? [-Werror=implicit-function-declaration]
1474 | nr = get_nr_dentry();
| ^~~~~~~~~~~~~
| retain_dentry
cc1: some warnings being treated as errors
vim +1474 fs/dcache.c
1463
1464 static void dentry_limit_worker_fn(struct work_struct *work)
1465 {
1466 struct dentry_limit_ctx ctx;
1467 unsigned long limit = READ_ONCE(sysctl_dentry_limit);
1468 unsigned int ms;
1469 long nr;
1470
1471 if (!limit)
1472 return;
1473
> 1474 nr = get_nr_dentry();
1475 if (nr <= (long)limit)
1476 return;
1477
1478 ctx.over = nr - (long)limit;
1479
1480 /* Phase 1: drain negative dentries across every superblock. */
1481 ctx.isolate = dentry_lru_isolate_negative;
1482 iterate_supers(dentry_limit_prune_sb, &ctx);
1483
1484 /* Phase 2: still over? Apply the ordinary LRU policy. */
1485 if (ctx.over > 0) {
1486 ctx.isolate = dentry_lru_isolate;
1487 iterate_supers(dentry_limit_prune_sb, &ctx);
1488 }
1489
1490 /*
1491 * Re-arm while still above the limit. Re-read the sysctls in
1492 * case the admin raised the cap or disabled the feature during
1493 * the walk.
1494 */
1495 limit = READ_ONCE(sysctl_dentry_limit);
1496 if (!limit || get_nr_dentry() <= (long)limit)
1497 return;
1498
1499 ms = READ_ONCE(sysctl_dentry_limit_interval_ms);
1500 queue_delayed_work(system_unbound_wq, &dentry_limit_work,
1501 msecs_to_jiffies(ms));
1502 }
1503
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper
2026-05-14 15:13 [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper Horst Birthelmer
2026-05-15 15:09 ` kernel test robot
@ 2026-05-15 15:09 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2026-05-15 15:09 UTC (permalink / raw)
To: Horst Birthelmer, Miklos Szeredi, Jonathan Corbet, Shuah Khan,
Alexander Viro, Christian Brauner, Jan Kara
Cc: llvm, oe-kbuild-all, linux-doc, linux-kernel, linux-fsdevel,
Horst Birthelmer
Hi Horst,
kernel test robot noticed the following build errors:
[auto build test ERROR on 5d6919055dec134de3c40167a490f33c74c12581]
url: https://github.com/intel-lab-lkp/linux/commits/Horst-Birthelmer/dcache-add-fs-dentry-limit-sysctl-with-negative-first-reaper/20260515-154600
base: 5d6919055dec134de3c40167a490f33c74c12581
patch link: https://lore.kernel.org/r/20260514-limit-dentries-cache-v1-1-431b9eb0c530%40ddn.com
patch subject: [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper
config: s390-randconfig-002-20260515 (https://download.01.org/0day-ci/archive/20260515/202605152329.WHnEvZt7-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260515/202605152329.WHnEvZt7-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605152329.WHnEvZt7-lkp@intel.com/
All errors (new ones prefixed by >>):
>> fs/dcache.c:1474:7: error: call to undeclared function 'get_nr_dentry'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1474 | nr = get_nr_dentry();
| ^
fs/dcache.c:1474:7: note: did you mean 'retain_dentry'?
fs/dcache.c:835:20: note: 'retain_dentry' declared here
835 | static inline bool retain_dentry(struct dentry *dentry, bool locked)
| ^
fs/dcache.c:1519:6: error: call to undeclared function 'get_nr_dentry'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1519 | if (get_nr_dentry() <= (long)limit)
| ^
2 errors generated.
vim +/get_nr_dentry +1474 fs/dcache.c
1463
1464 static void dentry_limit_worker_fn(struct work_struct *work)
1465 {
1466 struct dentry_limit_ctx ctx;
1467 unsigned long limit = READ_ONCE(sysctl_dentry_limit);
1468 unsigned int ms;
1469 long nr;
1470
1471 if (!limit)
1472 return;
1473
> 1474 nr = get_nr_dentry();
1475 if (nr <= (long)limit)
1476 return;
1477
1478 ctx.over = nr - (long)limit;
1479
1480 /* Phase 1: drain negative dentries across every superblock. */
1481 ctx.isolate = dentry_lru_isolate_negative;
1482 iterate_supers(dentry_limit_prune_sb, &ctx);
1483
1484 /* Phase 2: still over? Apply the ordinary LRU policy. */
1485 if (ctx.over > 0) {
1486 ctx.isolate = dentry_lru_isolate;
1487 iterate_supers(dentry_limit_prune_sb, &ctx);
1488 }
1489
1490 /*
1491 * Re-arm while still above the limit. Re-read the sysctls in
1492 * case the admin raised the cap or disabled the feature during
1493 * the walk.
1494 */
1495 limit = READ_ONCE(sysctl_dentry_limit);
1496 if (!limit || get_nr_dentry() <= (long)limit)
1497 return;
1498
1499 ms = READ_ONCE(sysctl_dentry_limit_interval_ms);
1500 queue_delayed_work(system_unbound_wq, &dentry_limit_work,
1501 msecs_to_jiffies(ms));
1502 }
1503
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-15 15:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 15:13 [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper Horst Birthelmer
2026-05-15 15:09 ` kernel test robot
2026-05-15 15:09 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox