* [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
@ 2025-11-06 17:05 Dai Ngo
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Dai Ngo @ 2025-11-06 17:05 UTC (permalink / raw)
To: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack
Cc: linux-fsdevel, linux-kernel, linux-nfs
When a layout conflict triggers a call to __break_lease, the function
nfsd4_layout_lm_break clears the fl_break_time timeout before sending
the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
its loop, waiting indefinitely for the conflicting file lease to be
released.
If the number of lease conflicts matches the number of NFSD threads (which
defaults to 8), all available NFSD threads become occupied. Consequently,
there are no threads left to handle incoming requests or callback replies,
leading to a total hang of the NFS server.
This issue is reliably reproducible by running the Git test suite on a
configuration using SCSI layout.
This patchset fixes this problem by introducing the new lm_breaker_timedout
operation to lease_manager_operations and using timeout for layout
lease break.
Documentation/filesystems/locking.rst | 2 ++
fs/locks.c | 14 +++++++++++---
fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
include/linux/filelock.h | 2 ++
4 files changed, 36 insertions(+), 7 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-06 17:05 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
@ 2025-11-06 17:05 ` Dai Ngo
2025-11-07 13:26 ` Christoph Hellwig
2025-11-06 17:05 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Dai Ngo @ 2025-11-06 17:05 UTC (permalink / raw)
To: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack
Cc: linux-fsdevel, linux-kernel, linux-nfs
Some consumers of the lease_manager_operations need to perform additional
actions when a lease break, triggered by a conflict, times out.
The NFS server is the first consumer of this operation.
When a pNFS layout conflict occurs, and the lease break times out -
resulting in the layout being revoked and its file_lease beeing removed
from the flc_lease list, the NFS server must issue a fence operation.
This ensures that the client is prevented from accessing the data
server after the layout is revoked.
Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
Documentation/filesystems/locking.rst | 2 ++
fs/locks.c | 14 +++++++++++---
include/linux/filelock.h | 2 ++
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 77704fde9845..cd600db6c4b9 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -403,6 +403,7 @@ prototypes::
bool (*lm_breaker_owns_lease)(struct file_lock *);
bool (*lm_lock_expirable)(struct file_lock *);
void (*lm_expire_lock)(void);
+ void (*lm_breaker_timedout)(struct file_lease *);
locking rules:
@@ -416,6 +417,7 @@ lm_change yes no no
lm_breaker_owns_lease: yes no no
lm_lock_expirable yes no no
lm_expire_lock no no yes
+lm_breaker_timedout no no yes
====================== ============= ================= =========
buffer_head
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..1f254e0cd398 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -369,9 +369,15 @@ locks_dispose_list(struct list_head *dispose)
while (!list_empty(dispose)) {
flc = list_first_entry(dispose, struct file_lock_core, flc_list);
list_del_init(&flc->flc_list);
- if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
+ if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
+ if (flc->flc_flags & FL_BREAKER_TIMEDOUT) {
+ struct file_lease *fl = file_lease(flc);
+
+ if (fl->fl_lmops->lm_breaker_timedout)
+ fl->fl_lmops->lm_breaker_timedout(fl);
+ }
locks_free_lease(file_lease(flc));
- else
+ } else
locks_free_lock(file_lock(flc));
}
}
@@ -1482,8 +1488,10 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
trace_time_out_leases(inode, fl);
if (past_time(fl->fl_downgrade_time))
lease_modify(fl, F_RDLCK, dispose);
- if (past_time(fl->fl_break_time))
+ if (past_time(fl->fl_break_time)) {
lease_modify(fl, F_UNLCK, dispose);
+ fl->c.flc_flags |= FL_BREAKER_TIMEDOUT;
+ }
}
}
diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index c2ce8ba05d06..06ccd6b66012 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -17,6 +17,7 @@
#define FL_OFDLCK 1024 /* lock is "owned" by struct file */
#define FL_LAYOUT 2048 /* outstanding pNFS layout */
#define FL_RECLAIM 4096 /* reclaiming from a reboot server */
+#define FL_BREAKER_TIMEDOUT 8192 /* lease breaker timed out */
#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
@@ -49,6 +50,7 @@ struct lease_manager_operations {
int (*lm_change)(struct file_lease *, int, struct list_head *);
void (*lm_setup)(struct file_lease *, void **);
bool (*lm_breaker_owns_lease)(struct file_lease *);
+ void (*lm_breaker_timedout)(struct file_lease *fl);
};
struct lock_manager {
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 17:05 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
@ 2025-11-06 17:05 ` Dai Ngo
2025-11-07 13:29 ` Christoph Hellwig
2025-11-07 13:30 ` [Patch 0/2] " Christoph Hellwig
2025-11-09 18:34 ` Benjamin Coddington
3 siblings, 1 reply; 15+ messages in thread
From: Dai Ngo @ 2025-11-06 17:05 UTC (permalink / raw)
To: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack
Cc: linux-fsdevel, linux-kernel, linux-nfs
When a layout conflict triggers a call to __break_lease, the function
nfsd4_layout_lm_break clears the fl_break_time timeout before sending
the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
its loop, waiting indefinitely for the conflicting file lease to be
released.
If the number of lease conflicts matches the number of NFSD threads
(which defaults to 8), all available NFSD threads become occupied.
Consequently, there are no threads left to handle incoming requests
or callback replies, leading to a total hang of the NFS server.
This issue is reliably reproducible by running the Git test suite
on a configuration using SCSI layout.
This patch addresses the problem by using the break lease timeout
and ensures that the unresponsive client is fenced, preventing it from
accessing the data server directly.
Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 683bd1130afe..b9b1eb32624c 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -747,11 +747,10 @@ static bool
nfsd4_layout_lm_break(struct file_lease *fl)
{
/*
- * We don't want the locks code to timeout the lease for us;
- * we'll remove it ourself if a layout isn't returned
- * in time:
+ * Enforce break lease timeout to prevent starvation of
+ * NFSD threads in __break_lease that causes server to
+ * hang.
*/
- fl->fl_break_time = 0;
nfsd4_recall_file_layout(fl->c.flc_owner);
return false;
}
@@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
return lease_modify(onlist, arg, dispose);
}
+static void nfsd_layout_breaker_timedout(struct file_lease *fl)
+{
+ struct nfs4_layout_stateid *ls = fl->c.flc_owner;
+ struct nfsd_file *nf;
+
+ rcu_read_lock();
+ nf = nfsd_file_get(ls->ls_file);
+ rcu_read_unlock();
+ if (nf) {
+ int type = ls->ls_layout_type;
+
+ if (nfsd4_layout_ops[type]->fence_client)
+ nfsd4_layout_ops[type]->fence_client(ls, nf);
+ nfsd_file_put(nf);
+ }
+}
+
static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
.lm_break = nfsd4_layout_lm_break,
.lm_change = nfsd4_layout_lm_change,
+ .lm_breaker_timedout = nfsd_layout_breaker_timedout,
};
int
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
@ 2025-11-07 13:26 ` Christoph Hellwig
2025-11-07 16:58 ` Dai Ngo
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2025-11-07 13:26 UTC (permalink / raw)
To: Dai Ngo
Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On Thu, Nov 06, 2025 at 09:05:25AM -0800, Dai Ngo wrote:
> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
I don't think adding an operation can fix nfsd code. This is just
infrastructure you need for the fix that sits in the nfsd code.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 17:05 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
@ 2025-11-07 13:29 ` Christoph Hellwig
2025-11-07 17:01 ` Dai Ngo
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2025-11-07 13:29 UTC (permalink / raw)
To: Dai Ngo
Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On Thu, Nov 06, 2025 at 09:05:26AM -0800, Dai Ngo wrote:
> When a layout conflict triggers a call to __break_lease, the function
> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> its loop, waiting indefinitely for the conflicting file lease to be
> released.
>
> If the number of lease conflicts matches the number of NFSD threads
> (which defaults to 8), all available NFSD threads become occupied.
> Consequently, there are no threads left to handle incoming requests
> or callback replies, leading to a total hang of the NFS server.
>
> This issue is reliably reproducible by running the Git test suite
> on a configuration using SCSI layout.
>
> This patch addresses the problem by using the break lease timeout
> and ensures that the unresponsive client is fenced, preventing it from
> accessing the data server directly.
>
> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
> fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> index 683bd1130afe..b9b1eb32624c 100644
> --- a/fs/nfsd/nfs4layouts.c
> +++ b/fs/nfsd/nfs4layouts.c
> @@ -747,11 +747,10 @@ static bool
> nfsd4_layout_lm_break(struct file_lease *fl)
> {
> /*
> - * We don't want the locks code to timeout the lease for us;
> - * we'll remove it ourself if a layout isn't returned
> - * in time:
> + * Enforce break lease timeout to prevent starvation of
> + * NFSD threads in __break_lease that causes server to
> + * hang.
> */
> - fl->fl_break_time = 0;
> nfsd4_recall_file_layout(fl->c.flc_owner);
> return false;
> }
> @@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
> return lease_modify(onlist, arg, dispose);
> }
>
> +static void nfsd_layout_breaker_timedout(struct file_lease *fl)
> +{
> + struct nfs4_layout_stateid *ls = fl->c.flc_owner;
> + struct nfsd_file *nf;
> +
> + rcu_read_lock();
> + nf = nfsd_file_get(ls->ls_file);
> + rcu_read_unlock();
> + if (nf) {
Just a little note on the existing infrastructure (and not this change
that uses it)h ere: I wish this would be nfsd_file_tryget and the
RCU locking was hidden in the helper. At least some users seems to
miss the RCU protection or rely on undocumented locks making it
not required (maybe?).
> + int type = ls->ls_layout_type;
ls_layout_type is a u32, so please use the same type.
Otherwise this looks good to me.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 17:05 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
2025-11-06 17:05 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
@ 2025-11-07 13:30 ` Christoph Hellwig
2025-11-09 18:34 ` Benjamin Coddington
3 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-11-07 13:30 UTC (permalink / raw)
To: Dai Ngo
Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On Thu, Nov 06, 2025 at 09:05:24AM -0800, Dai Ngo wrote:
> When a layout conflict triggers a call to __break_lease, the function
> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> its loop, waiting indefinitely for the conflicting file lease to be
> released.
>
> If the number of lease conflicts matches the number of NFSD threads (which
> defaults to 8), all available NFSD threads become occupied. Consequently,
> there are no threads left to handle incoming requests or callback replies,
> leading to a total hang of the NFS server.
>
> This issue is reliably reproducible by running the Git test suite on a
> configuration using SCSI layout.
I guess we need to implement asynchronous breaking of leases. Which
conceptually shouldn't be too hard.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-07 13:26 ` Christoph Hellwig
@ 2025-11-07 16:58 ` Dai Ngo
0 siblings, 0 replies; 15+ messages in thread
From: Dai Ngo @ 2025-11-07 16:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: chuck.lever, jlayton, neilb, okorniev, tom, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On 11/7/25 5:26 AM, Christoph Hellwig wrote:
> On Thu, Nov 06, 2025 at 09:05:25AM -0800, Dai Ngo wrote:
>> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
> I don't think adding an operation can fix nfsd code. This is just
> infrastructure you need for the fix that sits in the nfsd code.
The reason I added this tag is because patch 2/2 needs patch 1/2.
Since patch 2/2 has the fixes tag, could someone accidentally apply
it without patch 1/2?
-Dai
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-07 13:29 ` Christoph Hellwig
@ 2025-11-07 17:01 ` Dai Ngo
0 siblings, 0 replies; 15+ messages in thread
From: Dai Ngo @ 2025-11-07 17:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: chuck.lever, jlayton, neilb, okorniev, tom, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On 11/7/25 5:29 AM, Christoph Hellwig wrote:
> On Thu, Nov 06, 2025 at 09:05:26AM -0800, Dai Ngo wrote:
>> When a layout conflict triggers a call to __break_lease, the function
>> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
>> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
>> its loop, waiting indefinitely for the conflicting file lease to be
>> released.
>>
>> If the number of lease conflicts matches the number of NFSD threads
>> (which defaults to 8), all available NFSD threads become occupied.
>> Consequently, there are no threads left to handle incoming requests
>> or callback replies, leading to a total hang of the NFS server.
>>
>> This issue is reliably reproducible by running the Git test suite
>> on a configuration using SCSI layout.
>>
>> This patch addresses the problem by using the break lease timeout
>> and ensures that the unresponsive client is fenced, preventing it from
>> accessing the data server directly.
>>
>> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>> fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
>> 1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
>> index 683bd1130afe..b9b1eb32624c 100644
>> --- a/fs/nfsd/nfs4layouts.c
>> +++ b/fs/nfsd/nfs4layouts.c
>> @@ -747,11 +747,10 @@ static bool
>> nfsd4_layout_lm_break(struct file_lease *fl)
>> {
>> /*
>> - * We don't want the locks code to timeout the lease for us;
>> - * we'll remove it ourself if a layout isn't returned
>> - * in time:
>> + * Enforce break lease timeout to prevent starvation of
>> + * NFSD threads in __break_lease that causes server to
>> + * hang.
>> */
>> - fl->fl_break_time = 0;
>> nfsd4_recall_file_layout(fl->c.flc_owner);
>> return false;
>> }
>> @@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
>> return lease_modify(onlist, arg, dispose);
>> }
>>
>> +static void nfsd_layout_breaker_timedout(struct file_lease *fl)
>> +{
>> + struct nfs4_layout_stateid *ls = fl->c.flc_owner;
>> + struct nfsd_file *nf;
>> +
>> + rcu_read_lock();
>> + nf = nfsd_file_get(ls->ls_file);
>> + rcu_read_unlock();
>> + if (nf) {
> Just a little note on the existing infrastructure (and not this change
> that uses it)h ere: I wish this would be nfsd_file_tryget and the
> RCU locking was hidden in the helper. At least some users seems to
> miss the RCU protection or rely on undocumented locks making it
> not required (maybe?).
Make sense, maybe for another day.
>
>> + int type = ls->ls_layout_type;
> ls_layout_type is a u32, so please use the same type.
will fix.
>
> Otherwise this looks good to me.
Thanks,
-Dai
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 17:05 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
` (2 preceding siblings ...)
2025-11-07 13:30 ` [Patch 0/2] " Christoph Hellwig
@ 2025-11-09 18:34 ` Benjamin Coddington
2025-11-11 15:24 ` Dai Ngo
3 siblings, 1 reply; 15+ messages in thread
From: Benjamin Coddington @ 2025-11-09 18:34 UTC (permalink / raw)
To: Dai Ngo
Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On 6 Nov 2025, at 12:05, Dai Ngo wrote:
> When a layout conflict triggers a call to __break_lease, the function
> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> its loop, waiting indefinitely for the conflicting file lease to be
> released.
>
> If the number of lease conflicts matches the number of NFSD threads (which
> defaults to 8), all available NFSD threads become occupied. Consequently,
> there are no threads left to handle incoming requests or callback replies,
> leading to a total hang of the NFS server.
>
> This issue is reliably reproducible by running the Git test suite on a
> configuration using SCSI layout.
>
> This patchset fixes this problem by introducing the new lm_breaker_timedout
> operation to lease_manager_operations and using timeout for layout
> lease break.
Hey Dai,
I like your solution here, but I worry it can cause unexpected or
unnecessary client fencing when the problem is server-side (not enough
threads). Clients might be dutifully sending LAYOUTRETURN, but the server
can't service them - and this change will cause some potentially unexpected
fencing in environments where things could be fixed (by adding more knfsd
threads). Also, I think we significantly bumped default thread counts
recently in nfs-utils:
eb5abb5c60ab (tag: nfs-utils-2-8-2-rc3) nfsd: dump default number of threads to 16
You probably have already seen previous discussions about this:
https://lore.kernel.org/linux-nfs/1CC82EC5-6120-4EE4-A7F0-019CF7BC762C@redhat.com/
This also changes the behavior for all layouts, I haven't thought through
the implications of that - but I wish we could have knob for this behavior,
or perhaps a knfsd-specific fl_break_time tuneable.
Last thought (for now): I think Neil has some work for dynamic knfsd thread
count.. or Jeff? (I am having trouble finding it) Would that work around
this problem?
Regards,
Ben
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-09 18:34 ` Benjamin Coddington
@ 2025-11-11 15:24 ` Dai Ngo
2025-11-11 15:34 ` Chuck Lever
2025-11-11 15:45 ` Jeff Layton
0 siblings, 2 replies; 15+ messages in thread
From: Dai Ngo @ 2025-11-11 15:24 UTC (permalink / raw)
To: Benjamin Coddington
Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
Hi Ben,
On 11/9/25 10:34 AM, Benjamin Coddington wrote:
> On 6 Nov 2025, at 12:05, Dai Ngo wrote:
>
>> When a layout conflict triggers a call to __break_lease, the function
>> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
>> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
>> its loop, waiting indefinitely for the conflicting file lease to be
>> released.
>>
>> If the number of lease conflicts matches the number of NFSD threads (which
>> defaults to 8), all available NFSD threads become occupied. Consequently,
>> there are no threads left to handle incoming requests or callback replies,
>> leading to a total hang of the NFS server.
>>
>> This issue is reliably reproducible by running the Git test suite on a
>> configuration using SCSI layout.
>>
>> This patchset fixes this problem by introducing the new lm_breaker_timedout
>> operation to lease_manager_operations and using timeout for layout
>> lease break.
> Hey Dai,
>
> I like your solution here, but I worry it can cause unexpected or
> unnecessary client fencing when the problem is server-side (not enough
> threads). Clients might be dutifully sending LAYOUTRETURN, but the server
> can't service them
I agreed. This is a server problem and we penalize the client. We need
a long term solution for dealing resource shortage (server threads)
problem.
Fortunately, the client can detect reservation conflict errors and appears
to retry the I/O. Also, the client will ask for new layout and in the
process it re-registers its reservation key so I/O will continue.
> - and this change will cause some potentially unexpected
> fencing in environments where things could be fixed (by adding more knfsd
> threads).
> Also, I think we significantly bumped default thread counts
> recently in nfs-utils:
> eb5abb5c60ab (tag: nfs-utils-2-8-2-rc3) nfsd: dump default number of threads to 16
This helps a bit but if there is always a chance that there is a load
that requires more than the number of server threads.
>
> You probably have already seen previous discussions about this:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/1CC82EC5-6120-4EE4-A7F0-019CF7BC762C@redhat.com/__;!!ACWV5N9M2RV99hQ!Pq4vHQs-qk71XjZ0vOkONTD7nxkuyUUEKTBsJJ0L_OrFWudokphCyc2V0q0_OrNoGD3KnsgoHKp7rb_lDcs$
>
> This also changes the behavior for all layouts, I haven't thought through
> the implications of that - but I wish we could have knob for this behavior,
> or perhaps a knfsd-specific fl_break_time tuneable.
There is already a knob to tune the fl_break_time:
# cat /proc/sys/fs/lease-break-time
but currently lease-break-time is in seconds so the minimum we can set
is 1 which I think is still too long to tight up a server thread.
>
> Last thought (for now): I think Neil has some work for dynamic knfsd thread
> count.. or Jeff? (I am having trouble finding it) Would that work around
> this problem?
This would help, and I prefer this route rather than rework __break_lease
to return EAGAIN/jukebox while the server recalling the layout.
Thank you for your feedback,
-Dai
>
> Regards,
> Ben
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-11 15:24 ` Dai Ngo
@ 2025-11-11 15:34 ` Chuck Lever
2025-11-11 15:36 ` Christoph Hellwig
2025-11-11 15:43 ` Dai Ngo
2025-11-11 15:45 ` Jeff Layton
1 sibling, 2 replies; 15+ messages in thread
From: Chuck Lever @ 2025-11-11 15:34 UTC (permalink / raw)
To: Dai Ngo, Benjamin Coddington
Cc: jlayton, neilb, okorniev, tom, hch, alex.aring, viro, brauner,
jack, linux-fsdevel, linux-kernel, linux-nfs
On 11/11/25 10:24 AM, Dai Ngo wrote:
>>
>> Last thought (for now): I think Neil has some work for dynamic knfsd
>> thread
>> count.. or Jeff? (I am having trouble finding it) Would that work around
>> this problem?
>
> This would help, and I prefer this route rather than rework __break_lease
> to return EAGAIN/jukebox while the server recalling the layout.
Jeff is looking at continuing Neil's work in this area.
Adding more threads, IMHO, is not a good long term solution for this
particular issue. There's no guarantee that the server won't get stuck
no matter how many threads are created, and practically speaking, there
are only so many threads that can be created before the server goes
belly up. Or put another way, there's no way to formally prove that the
server will always be able to make forward progress with this solution.
We want NFSD to have a generic mechanism for deferring work so that an
nfsd thread never waits more than a few dozen milliseconds for anything.
This is the tactic NFSD uses for delegation recalls, for example.
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-11 15:34 ` Chuck Lever
@ 2025-11-11 15:36 ` Christoph Hellwig
2025-11-11 15:43 ` Dai Ngo
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-11-11 15:36 UTC (permalink / raw)
To: Chuck Lever
Cc: Dai Ngo, Benjamin Coddington, jlayton, neilb, okorniev, tom, hch,
alex.aring, viro, brauner, jack, linux-fsdevel, linux-kernel,
linux-nfs
On Tue, Nov 11, 2025 at 10:34:04AM -0500, Chuck Lever wrote:
> > This would help, and I prefer this route rather than rework __break_lease
> > to return EAGAIN/jukebox while the server recalling the layout.
>
> Jeff is looking at continuing Neil's work in this area.
>
> Adding more threads, IMHO, is not a good long term solution for this
> particular issue. There's no guarantee that the server won't get stuck
> no matter how many threads are created, and practically speaking, there
> are only so many threads that can be created before the server goes
> belly up. Or put another way, there's no way to formally prove that the
> server will always be able to make forward progress with this solution.
Agreed.
> We want NFSD to have a generic mechanism for deferring work so that an
> nfsd thread never waits more than a few dozen milliseconds for anything.
> This is the tactic NFSD uses for delegation recalls, for example.
Agreed. This would also be for I/O itself, as with O_DIRECT we can
fully support direct I/O, and even with buffered I/O there is some
limited non-blocking read and write support.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-11 15:34 ` Chuck Lever
2025-11-11 15:36 ` Christoph Hellwig
@ 2025-11-11 15:43 ` Dai Ngo
2025-11-11 15:53 ` Chuck Lever
1 sibling, 1 reply; 15+ messages in thread
From: Dai Ngo @ 2025-11-11 15:43 UTC (permalink / raw)
To: Chuck Lever, Benjamin Coddington
Cc: jlayton, neilb, okorniev, tom, hch, alex.aring, viro, brauner,
jack, linux-fsdevel, linux-kernel, linux-nfs
On 11/11/25 7:34 AM, Chuck Lever wrote:
> On 11/11/25 10:24 AM, Dai Ngo wrote:
>>> Last thought (for now): I think Neil has some work for dynamic knfsd
>>> thread
>>> count.. or Jeff? (I am having trouble finding it) Would that work around
>>> this problem?
>> This would help, and I prefer this route rather than rework __break_lease
>> to return EAGAIN/jukebox while the server recalling the layout.
> Jeff is looking at continuing Neil's work in this area.
>
> Adding more threads, IMHO, is not a good long term solution for this
> particular issue. There's no guarantee that the server won't get stuck
> no matter how many threads are created, and practically speaking, there
> are only so many threads that can be created before the server goes
> belly up. Or put another way, there's no way to formally prove that the
> server will always be able to make forward progress with this solution.
>
> We want NFSD to have a generic mechanism for deferring work so that an
> nfsd thread never waits more than a few dozen milliseconds for anything.
> This is the tactic NFSD uses for delegation recalls, for example.
I think we need both: (1) dynamic number of server threads and (2) the
ability to defer work as we currently do for the delegation recall. I'd
think we need (1) first as it applies for general server operations and
not just layout recalls.
Even if we had both of these enhancements, we still need to enforce timeout
for __break_lease since we don't want to wait for the recall forever.
-Dai
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-11 15:24 ` Dai Ngo
2025-11-11 15:34 ` Chuck Lever
@ 2025-11-11 15:45 ` Jeff Layton
1 sibling, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-11-11 15:45 UTC (permalink / raw)
To: Dai Ngo, Benjamin Coddington
Cc: chuck.lever, neilb, okorniev, tom, hch, alex.aring, viro, brauner,
jack, linux-fsdevel, linux-kernel, linux-nfs
On Tue, 2025-11-11 at 07:24 -0800, Dai Ngo wrote:
> Hi Ben,
>
> On 11/9/25 10:34 AM, Benjamin Coddington wrote:
> > On 6 Nov 2025, at 12:05, Dai Ngo wrote:
> >
> > > When a layout conflict triggers a call to __break_lease, the function
> > > nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> > > the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> > > its loop, waiting indefinitely for the conflicting file lease to be
> > > released.
> > >
> > > If the number of lease conflicts matches the number of NFSD threads (which
> > > defaults to 8), all available NFSD threads become occupied. Consequently,
> > > there are no threads left to handle incoming requests or callback replies,
> > > leading to a total hang of the NFS server.
> > >
> > > This issue is reliably reproducible by running the Git test suite on a
> > > configuration using SCSI layout.
> > >
> > > This patchset fixes this problem by introducing the new lm_breaker_timedout
> > > operation to lease_manager_operations and using timeout for layout
> > > lease break.
> > Hey Dai,
> >
> > I like your solution here, but I worry it can cause unexpected or
> > unnecessary client fencing when the problem is server-side (not enough
> > threads). Clients might be dutifully sending LAYOUTRETURN, but the server
> > can't service them
>
> I agreed. This is a server problem and we penalize the client. We need
> a long term solution for dealing resource shortage (server threads)
> problem.
>
> Fortunately, the client can detect reservation conflict errors and appears
> to retry the I/O. Also, the client will ask for new layout and in the
> process it re-registers its reservation key so I/O will continue.
>
> > - and this change will cause some potentially unexpected
> > fencing in environments where things could be fixed (by adding more knfsd
> > threads).
> > Also, I think we significantly bumped default thread counts
> > recently in nfs-utils:
> > eb5abb5c60ab (tag: nfs-utils-2-8-2-rc3) nfsd: dump default number of threads to 16
>
> This helps a bit but if there is always a chance that there is a load
> that requires more than the number of server threads.
>
> >
> > You probably have already seen previous discussions about this:
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/1CC82EC5-6120-4EE4-A7F0-019CF7BC762C@redhat.com/__;!!ACWV5N9M2RV99hQ!Pq4vHQs-qk71XjZ0vOkONTD7nxkuyUUEKTBsJJ0L_OrFWudokphCyc2V0q0_OrNoGD3KnsgoHKp7rb_lDcs$
> >
> > This also changes the behavior for all layouts, I haven't thought through
> > the implications of that - but I wish we could have knob for this behavior,
> > or perhaps a knfsd-specific fl_break_time tuneable.
>
> There is already a knob to tune the fl_break_time:
> # cat /proc/sys/fs/lease-break-time
>
> but currently lease-break-time is in seconds so the minimum we can set
> is 1 which I think is still too long to tight up a server thread.
>
> >
> > Last thought (for now): I think Neil has some work for dynamic knfsd thread
> > count.. or Jeff? (I am having trouble finding it) Would that work around
> > this problem?
>
It would help, up to a point, but so does increasing the static thread
count. Even if we get dynamically sized threadpool, we'll likely still
have a hard cap on the number of threads. We'll always going to be
subject to this if VFS operations are going to be blocking on
delegation breaks.
> This would help, and I prefer this route rather than rework __break_lease
> to return EAGAIN/jukebox while the server recalling the layout.
>
One way I can see to address this properly is to allow for non-blocking
lease breaks in some fashion. Basically, have the fs return
-EAGAIN (maybe after a short wait) at some point so that maybe
LAYOUTRETURN can get through once the client retries).
Plumbing that intent down to the actual break_layout() calls is a
problem though -- that's a lot of layers. I wonder if we need some per-
task flag that tells the layout engine "always do non-blocking lease
breaks"? That sounds pretty ugly too.
The only other way I could see to fix this is to move to an
asynchronous model of some sort. IOW, have at least some operations
(anything that could conceivably cause a layout break) done
asynchronously.
Then you could dispatch the operation and put the rqstp on a some sort
of waitqueue, and then let the thread do more work instead of blocking.
When the work is done, just requeue the rqstp to send the reply.
Just thinking out loud, but maybe we could use io_uring's underlying
infrastructure for this? Basically, set up an io_uring but do it all in
kernel space in nfsd thread context?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-11 15:43 ` Dai Ngo
@ 2025-11-11 15:53 ` Chuck Lever
0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2025-11-11 15:53 UTC (permalink / raw)
To: Dai Ngo, Benjamin Coddington
Cc: jlayton, neilb, okorniev, tom, hch, alex.aring, viro, brauner,
jack, linux-fsdevel, linux-kernel, linux-nfs
On 11/11/25 10:43 AM, Dai Ngo wrote:
> I think we need both: (1) dynamic number of server threads and (2) the
> ability to defer work as we currently do for the delegation recall.
Agreed. I wasn't trying to imply that dynamic thread count shouldn't be
done at all.
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-11-11 15:53 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06 17:05 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
2025-11-07 13:26 ` Christoph Hellwig
2025-11-07 16:58 ` Dai Ngo
2025-11-06 17:05 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-07 13:29 ` Christoph Hellwig
2025-11-07 17:01 ` Dai Ngo
2025-11-07 13:30 ` [Patch 0/2] " Christoph Hellwig
2025-11-09 18:34 ` Benjamin Coddington
2025-11-11 15:24 ` Dai Ngo
2025-11-11 15:34 ` Chuck Lever
2025-11-11 15:36 ` Christoph Hellwig
2025-11-11 15:43 ` Dai Ngo
2025-11-11 15:53 ` Chuck Lever
2025-11-11 15:45 ` Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).