* [PATCH v2 0/7] nfsd: filecache: various fixes
@ 2025-02-18 15:39 cel
2025-02-18 15:39 ` [PATCH v2 1/7] nfsd: filecache: remove race handling cel
` (7 more replies)
0 siblings, 8 replies; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Redriving to continue review dialogue:
> following is my attempt to fix up shrinking and gc of the NFSv3 filecache entries.
> There series is against nfsd-next.
https://lore.kernel.org/linux-nfs/20250207051701.3467505-1-neilb@suse.de/
Changes since original posting:
- I've assumed the role of shepherd for this series
- Rebase on the public nfsd-testing branch
- Remove nfsd_file_laundrette() call from nfsd_file_put() earlier in
the series
- Rework nfsd_file_gc_worker()
- Clarify one or two commit messages
Chuck Lever (1):
NFSD: Re-organize nfsd_file_gc_worker()
NeilBrown (6):
nfsd: filecache: remove race handling.
nfsd: filecache: use nfsd_file_dispose_list() in
nfsd_file_close_inode_sync()
nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
nfsd: filecache: introduce NFSD_FILE_RECENT
nfsd: filecache: don't repeatedly add/remove files on the lru list
nfsd: filecache: drop the list_lru lock during lock gc scans
fs/nfsd/filecache.c | 122 ++++++++++++++++++++++++--------------------
fs/nfsd/filecache.h | 7 +++
fs/nfsd/trace.h | 3 ++
3 files changed, 77 insertions(+), 55 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/7] nfsd: filecache: remove race handling.
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
@ 2025-02-18 15:39 ` cel
2025-02-18 15:39 ` [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker() cel
` (6 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
From: NeilBrown <neilb@suse.de>
The race that this code tries to protect against is not interesting.
The code is problematic as we access the "nf" after we have given our
reference to the lru system. While that takes 2+ seconds to free
things, it is still poor form.
The only interesting race I can find would be with
nfsd_file_close_inode_sync();
This is the only place that really doesn't want the file to stay on the
LRU when unhashed (which is the direct consequence of the race).
However for the race to happen, some other thread must own a reference
to a file and be putting it while nfsd_file_close_inode_sync() is trying
to close all files for an inode. If this is possible, that other thread
could simply call nfsd_file_put() a little bit later and the result
would be the same: not all files are closed when
nfsd_file_close_inode_sync() completes.
If this was really a problem, we would need to wait in close_inode_sync
for the other references to be dropped. We probably don't want to do
that.
So it is best to simply remove this code.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/filecache.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fb9b1656a287..909b5bc72bd3 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -370,22 +370,8 @@ nfsd_file_put(struct nfsd_file *nf)
if (refcount_dec_not_one(&nf->nf_ref))
return;
- /* Try to add it to the LRU. If that fails, decrement. */
- if (nfsd_file_lru_add(nf)) {
- /* If it's still hashed, we're done */
- if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
- nfsd_file_schedule_laundrette();
- return;
- }
-
- /*
- * We're racing with unhashing, so try to remove it from
- * the LRU. If removal fails, then someone else already
- * has our reference.
- */
- if (!nfsd_file_lru_remove(nf))
- return;
- }
+ if (nfsd_file_lru_add(nf))
+ return;
}
if (refcount_dec_and_test(&nf->nf_ref))
nfsd_file_free(nf);
--
2.47.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker()
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
2025-02-18 15:39 ` [PATCH v2 1/7] nfsd: filecache: remove race handling cel
@ 2025-02-18 15:39 ` cel
2025-02-18 19:59 ` Jeff Layton
2025-02-19 0:33 ` Dave Chinner
2025-02-18 15:39 ` [PATCH v2 3/7] nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync() cel
` (5 subsequent siblings)
7 siblings, 2 replies; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Dave opines:
IMO, there is no need to do this unnecessary work on every object
that is added to the LRU. Changing the gc worker to always run
every 2s and check if it has work to do like so:
static void
nfsd_file_gc_worker(struct work_struct *work)
{
- nfsd_file_gc();
- if (list_lru_count(&nfsd_file_lru))
- nfsd_file_schedule_laundrette();
+ if (list_lru_count(&nfsd_file_lru))
+ nfsd_file_gc();
+ nfsd_file_schedule_laundrette();
}
means that nfsd_file_gc() will be run the same way and have the same
behaviour as the current code. When the system it idle, it does a
list_lru_count() check every 2 seconds and goes back to sleep.
That's going to be pretty much unnoticable on most machines that run
NFS servers.
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/filecache.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 909b5bc72bd3..2933cba1e5f4 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -549,9 +549,9 @@ nfsd_file_gc(void)
static void
nfsd_file_gc_worker(struct work_struct *work)
{
- nfsd_file_gc();
+ nfsd_file_schedule_laundrette();
if (list_lru_count(&nfsd_file_lru))
- nfsd_file_schedule_laundrette();
+ nfsd_file_gc();
}
static unsigned long
--
2.47.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 3/7] nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
2025-02-18 15:39 ` [PATCH v2 1/7] nfsd: filecache: remove race handling cel
2025-02-18 15:39 ` [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker() cel
@ 2025-02-18 15:39 ` cel
2025-02-18 15:39 ` [PATCH v2 4/7] nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc() cel
` (4 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
From: NeilBrown <neilb@suse.de>
nfsd_file_close_inode_sync() contains an exact copy of
nfsd_file_dispose_list().
This patch removes that copy and calls nfsd_file_dispose_list()
instead.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/filecache.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2933cba1e5f4..143b5993a437 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -672,17 +672,12 @@ nfsd_file_close_inode(struct inode *inode)
void
nfsd_file_close_inode_sync(struct inode *inode)
{
- struct nfsd_file *nf;
LIST_HEAD(dispose);
trace_nfsd_file_close(inode);
nfsd_file_queue_for_close(inode, &dispose);
- while (!list_empty(&dispose)) {
- nf = list_first_entry(&dispose, struct nfsd_file, nf_gc);
- list_del_init(&nf->nf_gc);
- nfsd_file_free(nf);
- }
+ nfsd_file_dispose_list(&dispose);
}
static int
--
2.47.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 4/7] nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
` (2 preceding siblings ...)
2025-02-18 15:39 ` [PATCH v2 3/7] nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync() cel
@ 2025-02-18 15:39 ` cel
2025-02-18 15:39 ` [PATCH v2 5/7] nfsd: filecache: introduce NFSD_FILE_RECENT cel
` (3 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
From: NeilBrown <neilb@suse.de>
list_lru_walk() is only useful when the aim is to remove all elements
from the list_lru. It will repeatedly visit rotated elements of the
first per-node sublist before proceeding to subsequent sublists.
This patch changes nfsd_file_gc() to use list_lru_walk_node() and
list_lru_count_node() on each NUMA node.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/filecache.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 143b5993a437..747929c8c0d5 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -537,11 +537,16 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
static void
nfsd_file_gc(void)
{
+ unsigned long ret = 0;
LIST_HEAD(dispose);
- unsigned long ret;
+ int nid;
- ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
- &dispose, list_lru_count(&nfsd_file_lru));
+ for_each_node_state(nid, N_NORMAL_MEMORY) {
+ unsigned long nr = list_lru_count_node(&nfsd_file_lru, nid);
+
+ ret += list_lru_walk_node(&nfsd_file_lru, nid, nfsd_file_lru_cb,
+ &dispose, &nr);
+ }
trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
nfsd_file_dispose_list_delayed(&dispose);
}
--
2.47.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 5/7] nfsd: filecache: introduce NFSD_FILE_RECENT
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
` (3 preceding siblings ...)
2025-02-18 15:39 ` [PATCH v2 4/7] nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc() cel
@ 2025-02-18 15:39 ` cel
2025-02-18 15:39 ` [PATCH v2 6/7] nfsd: filecache: don't repeatedly add/remove files on the lru list cel
` (2 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
From: NeilBrown <neilb@suse.de>
The filecache lru is walked in 2 circumstances for 2 different reasons.
1/ When called from the shrinker we want to discard the first few
entries on the list, ignoring any with NFSD_FILE_REFERENCED set
because they should really be at the end of the LRU as they have been
referenced recently. So those ones are ROTATED.
2/ When called from the nfsd_file_gc() timer function we want to discard
anything that hasn't been used since before the previous call, and
mark everything else as unused at this point in time.
Using the same flag for both of these can result in some unexpected
outcomes. If the shrinker callback clears NFSD_FILE_REFERENCED then
nfsd_file_gc() will think the file hasn't been used in a while, while
really it has.
I think it is easier to reason about the behaviour if we instead have
two flags.
NFSD_FILE_REFERENCED means "this should be at the end of the LRU, please
put it there when convenient"
NFSD_FILE_RECENT means "this has been used recently - since the last
run of nfsd_file_gc()
When either caller finds an NFSD_FILE_REFERENCED entry, that entry
should be moved to the end of the LRU and the flag cleared. This can
safely happen at any time. The actual order on the lru might not be
strictly least-recently-used, but that is normal for linux lrus.
The shrinker callback can ignore the "recent" flag. If it ends up
freeing something that is "recent" that simply means that memory
pressure is sufficient to limit the acceptable cache age to less than
the nfsd_file_gc frequency.
The gc callback should primarily focus on NFSD_FILE_RECENT. It should
free everything that doesn't have this flag set, and should clear the
flag on everything else. When it clears the flag it is convenient to
clear the "REFERENCED" flag and move to the end of the LRU too.
With this, calls from the shrinker do not prematurely age files. It
will focus only on freeing those that are least recently used.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/filecache.c | 22 ++++++++++++++++++++--
fs/nfsd/filecache.h | 1 +
fs/nfsd/trace.h | 3 +++
3 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 747929c8c0d5..0d621833a9f2 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -319,10 +319,10 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK);
}
-
static bool nfsd_file_lru_add(struct nfsd_file *nf)
{
set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+ set_bit(NFSD_FILE_RECENT, &nf->nf_flags);
if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
trace_nfsd_file_lru_add(nf);
return true;
@@ -534,6 +534,24 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
return LRU_REMOVED;
}
+static enum lru_status
+nfsd_file_gc_cb(struct list_head *item, struct list_lru_one *lru,
+ void *arg)
+{
+ struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru);
+
+ if (test_and_clear_bit(NFSD_FILE_RECENT, &nf->nf_flags)) {
+ /*
+ * "REFERENCED" really means "should be at the end of the
+ * LRU. As we are putting it there we can clear the flag.
+ */
+ clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+ trace_nfsd_file_gc_aged(nf);
+ return LRU_ROTATE;
+ }
+ return nfsd_file_lru_cb(item, lru, arg);
+}
+
static void
nfsd_file_gc(void)
{
@@ -544,7 +562,7 @@ nfsd_file_gc(void)
for_each_node_state(nid, N_NORMAL_MEMORY) {
unsigned long nr = list_lru_count_node(&nfsd_file_lru, nid);
- ret += list_lru_walk_node(&nfsd_file_lru, nid, nfsd_file_lru_cb,
+ ret += list_lru_walk_node(&nfsd_file_lru, nid, nfsd_file_gc_cb,
&dispose, &nr);
}
trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index d5db6b34ba30..de5b8aa7fcb0 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -38,6 +38,7 @@ struct nfsd_file {
#define NFSD_FILE_PENDING (1)
#define NFSD_FILE_REFERENCED (2)
#define NFSD_FILE_GC (3)
+#define NFSD_FILE_RECENT (4)
unsigned long nf_flags;
refcount_t nf_ref;
unsigned char nf_may;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 49bbd26ffcdb..d93573504fa4 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1050,6 +1050,7 @@ DEFINE_CLID_EVENT(confirmed_r);
{ 1 << NFSD_FILE_HASHED, "HASHED" }, \
{ 1 << NFSD_FILE_PENDING, "PENDING" }, \
{ 1 << NFSD_FILE_REFERENCED, "REFERENCED" }, \
+ { 1 << NFSD_FILE_RECENT, "RECENT" }, \
{ 1 << NFSD_FILE_GC, "GC" })
DECLARE_EVENT_CLASS(nfsd_file_class,
@@ -1328,6 +1329,7 @@ DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed);
DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_aged);
DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_disposed);
DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class,
@@ -1357,6 +1359,7 @@ DEFINE_EVENT(nfsd_file_lruwalk_class, name, \
TP_ARGS(removed, remaining))
DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed);
+DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_recent);
DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed);
TRACE_EVENT(nfsd_file_close,
--
2.47.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 6/7] nfsd: filecache: don't repeatedly add/remove files on the lru list
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
` (4 preceding siblings ...)
2025-02-18 15:39 ` [PATCH v2 5/7] nfsd: filecache: introduce NFSD_FILE_RECENT cel
@ 2025-02-18 15:39 ` cel
2025-02-18 20:27 ` Jeff Layton
2025-02-18 15:39 ` [PATCH v2 7/7] nfsd: filecache: drop the list_lru lock during lock gc scans cel
2025-02-20 18:22 ` [PATCH v2 0/7] nfsd: filecache: various fixes Chuck Lever
7 siblings, 1 reply; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
From: NeilBrown <neilb@suse.de>
There is no need to remove a file from the lru every time we access it,
and then add it back. It is sufficient to set the REFERENCED flag every
time we put the file. The order in the lru of REFERENCED files is
largely irrelevant as they will all be moved to the end.
With this patch, files are added only when they are allocated (if
want_gc) and they are removed only by the list_lru_(shrink_)walk
callback or when forcibly removing a file.
This should reduce contention on the list_lru spinlock(s) and reduce
memory traffic a little.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/filecache.c | 47 ++++++++++++++++-----------------------------
1 file changed, 17 insertions(+), 30 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 0d621833a9f2..56935349f0e4 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -319,15 +319,14 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK);
}
-static bool nfsd_file_lru_add(struct nfsd_file *nf)
+static void nfsd_file_lru_add(struct nfsd_file *nf)
{
- set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
- set_bit(NFSD_FILE_RECENT, &nf->nf_flags);
- if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
+ refcount_inc(&nf->nf_ref);
+ if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru))
trace_nfsd_file_lru_add(nf);
- return true;
- }
- return false;
+ else
+ WARN_ON(1);
+ nfsd_file_schedule_laundrette();
}
static bool nfsd_file_lru_remove(struct nfsd_file *nf)
@@ -363,16 +362,10 @@ nfsd_file_put(struct nfsd_file *nf)
if (test_bit(NFSD_FILE_GC, &nf->nf_flags) &&
test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
- /*
- * If this is the last reference (nf_ref == 1), then try to
- * transfer it to the LRU.
- */
- if (refcount_dec_not_one(&nf->nf_ref))
- return;
-
- if (nfsd_file_lru_add(nf))
- return;
+ set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+ set_bit(NFSD_FILE_RECENT, &nf->nf_flags);
}
+
if (refcount_dec_and_test(&nf->nf_ref))
nfsd_file_free(nf);
}
@@ -516,13 +509,12 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
}
/*
- * Put the reference held on behalf of the LRU. If it wasn't the last
- * one, then just remove it from the LRU and ignore it.
+ * Put the reference held on behalf of the LRU if it is the last
+ * reference, else rotate.
*/
- if (!refcount_dec_and_test(&nf->nf_ref)) {
+ if (!refcount_dec_if_one(&nf->nf_ref)) {
trace_nfsd_file_gc_in_use(nf);
- list_lru_isolate(lru, &nf->nf_lru);
- return LRU_REMOVED;
+ return LRU_ROTATE;
}
/* Refcount went to zero. Unhash it and queue it to the dispose list */
@@ -1062,16 +1054,8 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
nf = nfsd_file_lookup_locked(net, current_cred(), inode, need, want_gc);
rcu_read_unlock();
- if (nf) {
- /*
- * If the nf is on the LRU then it holds an extra reference
- * that must be put if it's removed. It had better not be
- * the last one however, since we should hold another.
- */
- if (nfsd_file_lru_remove(nf))
- refcount_dec(&nf->nf_ref);
+ if (nf)
goto wait_for_construction;
- }
new = nfsd_file_alloc(net, inode, need, want_gc);
if (!new) {
@@ -1165,6 +1149,9 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
*/
if (status != nfs_ok || inode->i_nlink == 0)
nfsd_file_unhash(nf);
+ else if (want_gc)
+ nfsd_file_lru_add(nf);
+
clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags);
if (status == nfs_ok)
goto out;
--
2.47.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 7/7] nfsd: filecache: drop the list_lru lock during lock gc scans
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
` (5 preceding siblings ...)
2025-02-18 15:39 ` [PATCH v2 6/7] nfsd: filecache: don't repeatedly add/remove files on the lru list cel
@ 2025-02-18 15:39 ` cel
2025-02-18 20:51 ` Jeff Layton
2025-02-20 18:22 ` [PATCH v2 0/7] nfsd: filecache: various fixes Chuck Lever
7 siblings, 1 reply; 15+ messages in thread
From: cel @ 2025-02-18 15:39 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
From: NeilBrown <neilb@suse.de>
Under a high NFSv3 load with lots of different files being accessed,
the LRU list of garbage-collectable files can become quite long.
Asking list_lru_scan_node() to scan the whole list can result in a long
period during which a spinlock is held, blocking the addition of new LRU
items.
So ask list_lru_scan_node() to scan only a few entries at a time, and
repeat until the scan is complete.
If the shrinker runs between two consecutive calls of
list_lru_scan_node() it could invalidate the "remaining" counter which
could lead to premature freeing. So add a spinlock to avoid that.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/filecache.c | 27 ++++++++++++++++++++++++---
fs/nfsd/filecache.h | 6 ++++++
2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 56935349f0e4..9a41ccfc2df6 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -544,6 +544,13 @@ nfsd_file_gc_cb(struct list_head *item, struct list_lru_one *lru,
return nfsd_file_lru_cb(item, lru, arg);
}
+/* If the shrinker runs between calls to list_lru_walk_node() in
+ * nfsd_file_gc(), the "remaining" count will be wrong. This could
+ * result in premature freeing of some files. This may not matter much
+ * but is easy to fix with this spinlock which temporarily disables
+ * the shrinker.
+ */
+static DEFINE_SPINLOCK(nfsd_gc_lock);
static void
nfsd_file_gc(void)
{
@@ -551,12 +558,22 @@ nfsd_file_gc(void)
LIST_HEAD(dispose);
int nid;
+ spin_lock(&nfsd_gc_lock);
for_each_node_state(nid, N_NORMAL_MEMORY) {
- unsigned long nr = list_lru_count_node(&nfsd_file_lru, nid);
+ unsigned long remaining = list_lru_count_node(&nfsd_file_lru, nid);
- ret += list_lru_walk_node(&nfsd_file_lru, nid, nfsd_file_gc_cb,
- &dispose, &nr);
+ while (remaining > 0) {
+ unsigned long nr = min(remaining, NFSD_FILE_GC_BATCH);
+
+ remaining -= nr;
+ ret += list_lru_walk_node(&nfsd_file_lru, nid, nfsd_file_gc_cb,
+ &dispose, &nr);
+ if (nr)
+ /* walk aborted early */
+ remaining = 0;
+ }
}
+ spin_unlock(&nfsd_gc_lock);
trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
nfsd_file_dispose_list_delayed(&dispose);
}
@@ -581,8 +598,12 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
LIST_HEAD(dispose);
unsigned long ret;
+ if (!spin_trylock(&nfsd_gc_lock))
+ return SHRINK_STOP;
+
ret = list_lru_shrink_walk(&nfsd_file_lru, sc,
nfsd_file_lru_cb, &dispose);
+ spin_unlock(&nfsd_gc_lock);
trace_nfsd_file_shrinker_removed(ret, list_lru_count(&nfsd_file_lru));
nfsd_file_dispose_list_delayed(&dispose);
return ret;
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index de5b8aa7fcb0..5865f9c72712 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -3,6 +3,12 @@
#include <linux/fsnotify_backend.h>
+/*
+ * Limit the time that the list_lru_one lock is held during
+ * an LRU scan.
+ */
+#define NFSD_FILE_GC_BATCH (16UL)
+
/*
* This is the fsnotify_mark container that nfsd attaches to the files that it
* is holding open. Note that we have a separate refcount here aside from the
--
2.47.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker()
2025-02-18 15:39 ` [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker() cel
@ 2025-02-18 19:59 ` Jeff Layton
2025-02-19 0:33 ` Dave Chinner
1 sibling, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-02-18 19:59 UTC (permalink / raw)
To: cel, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner, Chuck Lever
On Tue, 2025-02-18 at 10:39 -0500, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Dave opines:
>
> IMO, there is no need to do this unnecessary work on every object
> that is added to the LRU. Changing the gc worker to always run
> every 2s and check if it has work to do like so:
>
> static void
> nfsd_file_gc_worker(struct work_struct *work)
> {
> - nfsd_file_gc();
> - if (list_lru_count(&nfsd_file_lru))
> - nfsd_file_schedule_laundrette();
> + if (list_lru_count(&nfsd_file_lru))
> + nfsd_file_gc();
> + nfsd_file_schedule_laundrette();
> }
>
> means that nfsd_file_gc() will be run the same way and have the same
> behaviour as the current code. When the system it idle, it does a
> list_lru_count() check every 2 seconds and goes back to sleep.
> That's going to be pretty much unnoticable on most machines that run
> NFS servers.
>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/filecache.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 909b5bc72bd3..2933cba1e5f4 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -549,9 +549,9 @@ nfsd_file_gc(void)
> static void
> nfsd_file_gc_worker(struct work_struct *work)
> {
> - nfsd_file_gc();
> + nfsd_file_schedule_laundrette();
> if (list_lru_count(&nfsd_file_lru))
> - nfsd_file_schedule_laundrette();
> + nfsd_file_gc();
> }
>
> static unsigned long
Given that it's a delayed workqueue job, it probably doesn't matter,
but why schedule the laundrette before doing the nfsd_file_gc() call?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 6/7] nfsd: filecache: don't repeatedly add/remove files on the lru list
2025-02-18 15:39 ` [PATCH v2 6/7] nfsd: filecache: don't repeatedly add/remove files on the lru list cel
@ 2025-02-18 20:27 ` Jeff Layton
0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-02-18 20:27 UTC (permalink / raw)
To: cel, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
On Tue, 2025-02-18 at 10:39 -0500, cel@kernel.org wrote:
> From: NeilBrown <neilb@suse.de>
>
> There is no need to remove a file from the lru every time we access it,
> and then add it back. It is sufficient to set the REFERENCED flag every
> time we put the file. The order in the lru of REFERENCED files is
> largely irrelevant as they will all be moved to the end.
>
> With this patch, files are added only when they are allocated (if
> want_gc) and they are removed only by the list_lru_(shrink_)walk
> callback or when forcibly removing a file.
>
> This should reduce contention on the list_lru spinlock(s) and reduce
> memory traffic a little.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/filecache.c | 47 ++++++++++++++++-----------------------------
> 1 file changed, 17 insertions(+), 30 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 0d621833a9f2..56935349f0e4 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -319,15 +319,14 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
> mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK);
> }
>
> -static bool nfsd_file_lru_add(struct nfsd_file *nf)
> +static void nfsd_file_lru_add(struct nfsd_file *nf)
> {
> - set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> - set_bit(NFSD_FILE_RECENT, &nf->nf_flags);
> - if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
> + refcount_inc(&nf->nf_ref);
> + if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru))
> trace_nfsd_file_lru_add(nf);
> - return true;
> - }
> - return false;
> + else
> + WARN_ON(1);
> + nfsd_file_schedule_laundrette();
> }
>
> static bool nfsd_file_lru_remove(struct nfsd_file *nf)
> @@ -363,16 +362,10 @@ nfsd_file_put(struct nfsd_file *nf)
>
> if (test_bit(NFSD_FILE_GC, &nf->nf_flags) &&
> test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> - /*
> - * If this is the last reference (nf_ref == 1), then try to
> - * transfer it to the LRU.
> - */
> - if (refcount_dec_not_one(&nf->nf_ref))
> - return;
> -
> - if (nfsd_file_lru_add(nf))
> - return;
> + set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> + set_bit(NFSD_FILE_RECENT, &nf->nf_flags);
> }
> +
> if (refcount_dec_and_test(&nf->nf_ref))
> nfsd_file_free(nf);
> }
> @@ -516,13 +509,12 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
> }
>
> /*
> - * Put the reference held on behalf of the LRU. If it wasn't the last
> - * one, then just remove it from the LRU and ignore it.
> + * Put the reference held on behalf of the LRU if it is the last
> + * reference, else rotate.
> */
> - if (!refcount_dec_and_test(&nf->nf_ref)) {
> + if (!refcount_dec_if_one(&nf->nf_ref)) {
> trace_nfsd_file_gc_in_use(nf);
> - list_lru_isolate(lru, &nf->nf_lru);
> - return LRU_REMOVED;
> + return LRU_ROTATE;
> }
>
> /* Refcount went to zero. Unhash it and queue it to the dispose list */
> @@ -1062,16 +1054,8 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
> nf = nfsd_file_lookup_locked(net, current_cred(), inode, need, want_gc);
> rcu_read_unlock();
>
> - if (nf) {
> - /*
> - * If the nf is on the LRU then it holds an extra reference
> - * that must be put if it's removed. It had better not be
> - * the last one however, since we should hold another.
> - */
> - if (nfsd_file_lru_remove(nf))
> - refcount_dec(&nf->nf_ref);
> + if (nf)
> goto wait_for_construction;
> - }
>
> new = nfsd_file_alloc(net, inode, need, want_gc);
> if (!new) {
> @@ -1165,6 +1149,9 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
> */
> if (status != nfs_ok || inode->i_nlink == 0)
> nfsd_file_unhash(nf);
> + else if (want_gc)
> + nfsd_file_lru_add(nf);
> +
> clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> if (status == nfs_ok)
> goto out;
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 7/7] nfsd: filecache: drop the list_lru lock during lock gc scans
2025-02-18 15:39 ` [PATCH v2 7/7] nfsd: filecache: drop the list_lru lock during lock gc scans cel
@ 2025-02-18 20:51 ` Jeff Layton
0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-02-18 20:51 UTC (permalink / raw)
To: cel, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Dave Chinner
On Tue, 2025-02-18 at 10:39 -0500, cel@kernel.org wrote:
> From: NeilBrown <neilb@suse.de>
>
> Under a high NFSv3 load with lots of different files being accessed,
> the LRU list of garbage-collectable files can become quite long.
>
> Asking list_lru_scan_node() to scan the whole list can result in a long
> period during which a spinlock is held, blocking the addition of new LRU
> items.
>
> So ask list_lru_scan_node() to scan only a few entries at a time, and
> repeat until the scan is complete.
>
> If the shrinker runs between two consecutive calls of
> list_lru_scan_node() it could invalidate the "remaining" counter which
> could lead to premature freeing. So add a spinlock to avoid that.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/filecache.c | 27 ++++++++++++++++++++++++---
> fs/nfsd/filecache.h | 6 ++++++
> 2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 56935349f0e4..9a41ccfc2df6 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -544,6 +544,13 @@ nfsd_file_gc_cb(struct list_head *item, struct list_lru_one *lru,
> return nfsd_file_lru_cb(item, lru, arg);
> }
>
> +/* If the shrinker runs between calls to list_lru_walk_node() in
> + * nfsd_file_gc(), the "remaining" count will be wrong. This could
> + * result in premature freeing of some files. This may not matter much
> + * but is easy to fix with this spinlock which temporarily disables
> + * the shrinker.
> + */
> +static DEFINE_SPINLOCK(nfsd_gc_lock);
Having this as a global lock makes sense since there is just a single
shrinker and laundrette for the whole kernel. I don't think it's
worthwhile to make them per-net or anything either.
> static void
> nfsd_file_gc(void)
> {
> @@ -551,12 +558,22 @@ nfsd_file_gc(void)
> LIST_HEAD(dispose);
> int nid;
>
> + spin_lock(&nfsd_gc_lock);
> for_each_node_state(nid, N_NORMAL_MEMORY) {
> - unsigned long nr = list_lru_count_node(&nfsd_file_lru, nid);
> + unsigned long remaining = list_lru_count_node(&nfsd_file_lru, nid);
>
> - ret += list_lru_walk_node(&nfsd_file_lru, nid, nfsd_file_gc_cb,
> - &dispose, &nr);
> + while (remaining > 0) {
> + unsigned long nr = min(remaining, NFSD_FILE_GC_BATCH);
> +
> + remaining -= nr;
> + ret += list_lru_walk_node(&nfsd_file_lru, nid, nfsd_file_gc_cb,
> + &dispose, &nr);
> + if (nr)
> + /* walk aborted early */
> + remaining = 0;
> + }
> }
Now that I look, if we end up walking a long list on a different NUMA
node, this could mean a lot of cross-node calls.
This is probably in the "further work" category, but...
Maybe we should switch the laundrette to have a work struct per-node,
and then schedule all of them on their respective nodes when we start
the laundrette.
If you do that, then the nfsd_gc_lock could also be per-node.
> + spin_unlock(&nfsd_gc_lock);
> trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
> nfsd_file_dispose_list_delayed(&dispose);
> }
> @@ -581,8 +598,12 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
> LIST_HEAD(dispose);
> unsigned long ret;
>
> + if (!spin_trylock(&nfsd_gc_lock))
> + return SHRINK_STOP;
> +
> ret = list_lru_shrink_walk(&nfsd_file_lru, sc,
> nfsd_file_lru_cb, &dispose);
> + spin_unlock(&nfsd_gc_lock);
> trace_nfsd_file_shrinker_removed(ret, list_lru_count(&nfsd_file_lru));
> nfsd_file_dispose_list_delayed(&dispose);
> return ret;
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index de5b8aa7fcb0..5865f9c72712 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -3,6 +3,12 @@
>
> #include <linux/fsnotify_backend.h>
>
> +/*
> + * Limit the time that the list_lru_one lock is held during
> + * an LRU scan.
> + */
> +#define NFSD_FILE_GC_BATCH (16UL)
> +
> /*
> * This is the fsnotify_mark container that nfsd attaches to the files that it
> * is holding open. Note that we have a separate refcount here aside from the
No objection to this patch as an interim step though.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker()
2025-02-18 15:39 ` [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker() cel
2025-02-18 19:59 ` Jeff Layton
@ 2025-02-19 0:33 ` Dave Chinner
2025-02-19 1:20 ` NeilBrown
2025-02-19 14:01 ` Chuck Lever
1 sibling, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2025-02-19 0:33 UTC (permalink / raw)
To: cel
Cc: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
linux-nfs, Chuck Lever
On Tue, Feb 18, 2025 at 10:39:32AM -0500, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Dave opines:
>
> IMO, there is no need to do this unnecessary work on every object
> that is added to the LRU. Changing the gc worker to always run
> every 2s and check if it has work to do like so:
>
> static void
> nfsd_file_gc_worker(struct work_struct *work)
> {
> - nfsd_file_gc();
> - if (list_lru_count(&nfsd_file_lru))
> - nfsd_file_schedule_laundrette();
> + if (list_lru_count(&nfsd_file_lru))
> + nfsd_file_gc();
> + nfsd_file_schedule_laundrette();
> }
>
> means that nfsd_file_gc() will be run the same way and have the same
> behaviour as the current code. When the system it idle, it does a
> list_lru_count() check every 2 seconds and goes back to sleep.
> That's going to be pretty much unnoticable on most machines that run
> NFS servers.
>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/filecache.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 909b5bc72bd3..2933cba1e5f4 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -549,9 +549,9 @@ nfsd_file_gc(void)
> static void
> nfsd_file_gc_worker(struct work_struct *work)
> {
> - nfsd_file_gc();
> + nfsd_file_schedule_laundrette();
> if (list_lru_count(&nfsd_file_lru))
> - nfsd_file_schedule_laundrette();
> + nfsd_file_gc();
> }
IMO, the scheduling of new work is the wrong way around. It should
be done on completion of gc work, not before gc work is started.
i.e. If nfsd_file_gc() is overly delayed (because load, rt preempt,
etc), then a new gc worker will be started in 2s regardless of
whether the currently running gc worker has completed or not.
Worse case, there's a spinlock hang bug in nfsd_file_gc(). This code
will end up with N worker threads all spinning up in nfsd_file_gc()
chewing up all the CPU in the system, not making any progress....
If we schedule new work after completion of this work, then gc might
hang but it won't slowly drag the entire system down with it.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker()
2025-02-19 0:33 ` Dave Chinner
@ 2025-02-19 1:20 ` NeilBrown
2025-02-19 14:01 ` Chuck Lever
1 sibling, 0 replies; 15+ messages in thread
From: NeilBrown @ 2025-02-19 1:20 UTC (permalink / raw)
To: Dave Chinner
Cc: cel, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
linux-nfs, Chuck Lever
On Wed, 19 Feb 2025, Dave Chinner wrote:
> On Tue, Feb 18, 2025 at 10:39:32AM -0500, cel@kernel.org wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > Dave opines:
> >
> > IMO, there is no need to do this unnecessary work on every object
> > that is added to the LRU. Changing the gc worker to always run
> > every 2s and check if it has work to do like so:
> >
> > static void
> > nfsd_file_gc_worker(struct work_struct *work)
> > {
> > - nfsd_file_gc();
> > - if (list_lru_count(&nfsd_file_lru))
> > - nfsd_file_schedule_laundrette();
> > + if (list_lru_count(&nfsd_file_lru))
> > + nfsd_file_gc();
> > + nfsd_file_schedule_laundrette();
> > }
> >
> > means that nfsd_file_gc() will be run the same way and have the same
> > behaviour as the current code. When the system it idle, it does a
> > list_lru_count() check every 2 seconds and goes back to sleep.
> > That's going to be pretty much unnoticable on most machines that run
> > NFS servers.
> >
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> > fs/nfsd/filecache.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 909b5bc72bd3..2933cba1e5f4 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -549,9 +549,9 @@ nfsd_file_gc(void)
> > static void
> > nfsd_file_gc_worker(struct work_struct *work)
> > {
> > - nfsd_file_gc();
> > + nfsd_file_schedule_laundrette();
> > if (list_lru_count(&nfsd_file_lru))
> > - nfsd_file_schedule_laundrette();
> > + nfsd_file_gc();
> > }
>
> IMO, the scheduling of new work is the wrong way around. It should
> be done on completion of gc work, not before gc work is started.
>
> i.e. If nfsd_file_gc() is overly delayed (because load, rt preempt,
> etc), then a new gc worker will be started in 2s regardless of
> whether the currently running gc worker has completed or not.
>
> Worse case, there's a spinlock hang bug in nfsd_file_gc(). This code
> will end up with N worker threads all spinning up in nfsd_file_gc()
> chewing up all the CPU in the system, not making any progress....
> If we schedule new work after completion of this work, then gc might
> hang but it won't slowly drag the entire system down with it.
While I agree that the enqueue is best done later rather than earlier, I
think your worst-case is over-stated.
queue_delayed_work() is a no-op if WORK_STRUCT_PENDING_BIT is still set.
A given work_struct can only be running once.
If the timer fires while nfsd_file_gc() is still running,
nfsd_filecache_laundrette will be queued to start immediately that the
currently running instance completes. So the worst cases is that
there will always be one instance running.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker()
2025-02-19 0:33 ` Dave Chinner
2025-02-19 1:20 ` NeilBrown
@ 2025-02-19 14:01 ` Chuck Lever
1 sibling, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2025-02-19 14:01 UTC (permalink / raw)
To: Dave Chinner, cel
Cc: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
linux-nfs
On 2/18/25 7:33 PM, Dave Chinner wrote:
> On Tue, Feb 18, 2025 at 10:39:32AM -0500, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Dave opines:
>>
>> IMO, there is no need to do this unnecessary work on every object
>> that is added to the LRU. Changing the gc worker to always run
>> every 2s and check if it has work to do like so:
>>
>> static void
>> nfsd_file_gc_worker(struct work_struct *work)
>> {
>> - nfsd_file_gc();
>> - if (list_lru_count(&nfsd_file_lru))
>> - nfsd_file_schedule_laundrette();
>> + if (list_lru_count(&nfsd_file_lru))
>> + nfsd_file_gc();
>> + nfsd_file_schedule_laundrette();
>> }
>>
>> means that nfsd_file_gc() will be run the same way and have the same
>> behaviour as the current code. When the system it idle, it does a
>> list_lru_count() check every 2 seconds and goes back to sleep.
>> That's going to be pretty much unnoticable on most machines that run
>> NFS servers.
>>
>> Suggested-by: Dave Chinner <david@fromorbit.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/filecache.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index 909b5bc72bd3..2933cba1e5f4 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -549,9 +549,9 @@ nfsd_file_gc(void)
>> static void
>> nfsd_file_gc_worker(struct work_struct *work)
>> {
>> - nfsd_file_gc();
>> + nfsd_file_schedule_laundrette();
>> if (list_lru_count(&nfsd_file_lru))
>> - nfsd_file_schedule_laundrette();
>> + nfsd_file_gc();
>> }
>
> IMO, the scheduling of new work is the wrong way around. It should
> be done on completion of gc work, not before gc work is started.
>
> i.e. If nfsd_file_gc() is overly delayed (because load, rt preempt,
> etc), then a new gc worker will be started in 2s regardless of
> whether the currently running gc worker has completed or not.
>
> Worse case, there's a spinlock hang bug in nfsd_file_gc(). This code
> will end up with N worker threads all spinning up in nfsd_file_gc()
> chewing up all the CPU in the system, not making any progress....
> If we schedule new work after completion of this work, then gc might
> hang but it won't slowly drag the entire system down with it.
My bad. I miscopied your suggestion. Will fix in my tree.
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/7] nfsd: filecache: various fixes
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
` (6 preceding siblings ...)
2025-02-18 15:39 ` [PATCH v2 7/7] nfsd: filecache: drop the list_lru lock during lock gc scans cel
@ 2025-02-20 18:22 ` Chuck Lever
7 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2025-02-20 18:22 UTC (permalink / raw)
To: cel, Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo,
Tom Talpey
Cc: linux-nfs, Dave Chinner
On 2/18/25 10:39 AM, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Redriving to continue review dialogue:
>
>> following is my attempt to fix up shrinking and gc of the NFSv3 filecache entries.
>> There series is against nfsd-next.
>
> https://lore.kernel.org/linux-nfs/20250207051701.3467505-1-neilb@suse.de/
I've pulled this series (with suggested updates) into nfsd-testing to
gain a little more test exposure.
> Changes since original posting:
> - I've assumed the role of shepherd for this series
> - Rebase on the public nfsd-testing branch
> - Remove nfsd_file_laundrette() call from nfsd_file_put() earlier in
> the series
> - Rework nfsd_file_gc_worker()
> - Clarify one or two commit messages
>
> Chuck Lever (1):
> NFSD: Re-organize nfsd_file_gc_worker()
>
> NeilBrown (6):
> nfsd: filecache: remove race handling.
> nfsd: filecache: use nfsd_file_dispose_list() in
> nfsd_file_close_inode_sync()
> nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
> nfsd: filecache: introduce NFSD_FILE_RECENT
> nfsd: filecache: don't repeatedly add/remove files on the lru list
> nfsd: filecache: drop the list_lru lock during lock gc scans
>
> fs/nfsd/filecache.c | 122 ++++++++++++++++++++++++--------------------
> fs/nfsd/filecache.h | 7 +++
> fs/nfsd/trace.h | 3 ++
> 3 files changed, 77 insertions(+), 55 deletions(-)
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-02-20 18:22 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 15:39 [PATCH v2 0/7] nfsd: filecache: various fixes cel
2025-02-18 15:39 ` [PATCH v2 1/7] nfsd: filecache: remove race handling cel
2025-02-18 15:39 ` [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker() cel
2025-02-18 19:59 ` Jeff Layton
2025-02-19 0:33 ` Dave Chinner
2025-02-19 1:20 ` NeilBrown
2025-02-19 14:01 ` Chuck Lever
2025-02-18 15:39 ` [PATCH v2 3/7] nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync() cel
2025-02-18 15:39 ` [PATCH v2 4/7] nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc() cel
2025-02-18 15:39 ` [PATCH v2 5/7] nfsd: filecache: introduce NFSD_FILE_RECENT cel
2025-02-18 15:39 ` [PATCH v2 6/7] nfsd: filecache: don't repeatedly add/remove files on the lru list cel
2025-02-18 20:27 ` Jeff Layton
2025-02-18 15:39 ` [PATCH v2 7/7] nfsd: filecache: drop the list_lru lock during lock gc scans cel
2025-02-18 20:51 ` Jeff Layton
2025-02-20 18:22 ` [PATCH v2 0/7] nfsd: filecache: various fixes Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox