* [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
@ 2025-11-06 16:47 Dai Ngo
2025-11-06 16:47 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
2025-11-06 16:47 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
0 siblings, 2 replies; 12+ messages in thread
From: Dai Ngo @ 2025-11-06 16:47 UTC (permalink / raw)
To: chuck.lever, jlayton, neilb, okorniev, tom, hch; +Cc: linux-nfs
When a layout conflict triggers a call to __break_lease, the function
nfsd4_layout_lm_break clears the fl_break_time timeout before sending
the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
its loop, waiting indefinitely for the conflicting file lease to be
released.
If the number of lease conflicts matches the number of NFSD threads (which
defaults to 8), all available NFSD threads become occupied. Consequently,
there are no threads left to handle incoming requests or callback replies,
leading to a total hang of the NFS server.
This issue is reliably reproducible by running the Git test suite on a
configuration using SCSI layout.
This patchset fixes this problem by introducing the new lm_breaker_timedout
operation to lease_manager_operations and using timeout for layout
lease break.
Documentation/filesystems/locking.rst | 2 ++
fs/locks.c | 14 +++++++++++---
fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
include/linux/filelock.h | 2 ++
4 files changed, 36 insertions(+), 7 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-06 16:47 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
@ 2025-11-06 16:47 ` Dai Ngo
2025-11-06 17:23 ` Jeff Layton
2025-11-06 16:47 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
1 sibling, 1 reply; 12+ messages in thread
From: Dai Ngo @ 2025-11-06 16:47 UTC (permalink / raw)
To: chuck.lever, jlayton, neilb, okorniev, tom, hch; +Cc: linux-nfs
Some consumers of the lease_manager_operations need to perform additional
actions when a lease break, triggered by a conflict, times out.
The NFS server is the first consumer of this operation.
When a pNFS layout conflict occurs, and the lease break times out -
resulting in the layout being revoked and its file_lease beeing removed
from the flc_lease list, the NFS server must issue a fence operation.
This ensures that the client is prevented from accessing the data
server after the layout is revoked.
Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
Documentation/filesystems/locking.rst | 2 ++
fs/locks.c | 14 +++++++++++---
include/linux/filelock.h | 2 ++
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 77704fde9845..cd600db6c4b9 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -403,6 +403,7 @@ prototypes::
bool (*lm_breaker_owns_lease)(struct file_lock *);
bool (*lm_lock_expirable)(struct file_lock *);
void (*lm_expire_lock)(void);
+ void (*lm_breaker_timedout)(struct file_lease *);
locking rules:
@@ -416,6 +417,7 @@ lm_change yes no no
lm_breaker_owns_lease: yes no no
lm_lock_expirable yes no no
lm_expire_lock no no yes
+lm_breaker_timedout no no yes
====================== ============= ================= =========
buffer_head
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..1f254e0cd398 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -369,9 +369,15 @@ locks_dispose_list(struct list_head *dispose)
while (!list_empty(dispose)) {
flc = list_first_entry(dispose, struct file_lock_core, flc_list);
list_del_init(&flc->flc_list);
- if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
+ if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
+ if (flc->flc_flags & FL_BREAKER_TIMEDOUT) {
+ struct file_lease *fl = file_lease(flc);
+
+ if (fl->fl_lmops->lm_breaker_timedout)
+ fl->fl_lmops->lm_breaker_timedout(fl);
+ }
locks_free_lease(file_lease(flc));
- else
+ } else
locks_free_lock(file_lock(flc));
}
}
@@ -1482,8 +1488,10 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
trace_time_out_leases(inode, fl);
if (past_time(fl->fl_downgrade_time))
lease_modify(fl, F_RDLCK, dispose);
- if (past_time(fl->fl_break_time))
+ if (past_time(fl->fl_break_time)) {
lease_modify(fl, F_UNLCK, dispose);
+ fl->c.flc_flags |= FL_BREAKER_TIMEDOUT;
+ }
}
}
diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index c2ce8ba05d06..06ccd6b66012 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -17,6 +17,7 @@
#define FL_OFDLCK 1024 /* lock is "owned" by struct file */
#define FL_LAYOUT 2048 /* outstanding pNFS layout */
#define FL_RECLAIM 4096 /* reclaiming from a reboot server */
+#define FL_BREAKER_TIMEDOUT 8192 /* lease breaker timed out */
#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
@@ -49,6 +50,7 @@ struct lease_manager_operations {
int (*lm_change)(struct file_lease *, int, struct list_head *);
void (*lm_setup)(struct file_lease *, void **);
bool (*lm_breaker_owns_lease)(struct file_lease *);
+ void (*lm_breaker_timedout)(struct file_lease *fl);
};
struct lock_manager {
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 16:47 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-06 16:47 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
@ 2025-11-06 16:47 ` Dai Ngo
2025-11-06 17:14 ` Jeff Layton
1 sibling, 1 reply; 12+ messages in thread
From: Dai Ngo @ 2025-11-06 16:47 UTC (permalink / raw)
To: chuck.lever, jlayton, neilb, okorniev, tom, hch; +Cc: linux-nfs
When a layout conflict triggers a call to __break_lease, the function
nfsd4_layout_lm_break clears the fl_break_time timeout before sending
the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
its loop, waiting indefinitely for the conflicting file lease to be
released.
If the number of lease conflicts matches the number of NFSD threads
(which defaults to 8), all available NFSD threads become occupied.
Consequently, there are no threads left to handle incoming requests
or callback replies, leading to a total hang of the NFS server.
This issue is reliably reproducible by running the Git test suite
on a configuration using SCSI layout.
This patch addresses the problem by using the break lease timeout
and ensures that the unresponsive client is fenced, preventing it from
accessing the data server directly.
Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 683bd1130afe..b9b1eb32624c 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -747,11 +747,10 @@ static bool
nfsd4_layout_lm_break(struct file_lease *fl)
{
/*
- * We don't want the locks code to timeout the lease for us;
- * we'll remove it ourself if a layout isn't returned
- * in time:
+ * Enforce break lease timeout to prevent starvation of
+ * NFSD threads in __break_lease that causes server to
+ * hang.
*/
- fl->fl_break_time = 0;
nfsd4_recall_file_layout(fl->c.flc_owner);
return false;
}
@@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
return lease_modify(onlist, arg, dispose);
}
+static void nfsd_layout_breaker_timedout(struct file_lease *fl)
+{
+ struct nfs4_layout_stateid *ls = fl->c.flc_owner;
+ struct nfsd_file *nf;
+
+ rcu_read_lock();
+ nf = nfsd_file_get(ls->ls_file);
+ rcu_read_unlock();
+ if (nf) {
+ int type = ls->ls_layout_type;
+
+ if (nfsd4_layout_ops[type]->fence_client)
+ nfsd4_layout_ops[type]->fence_client(ls, nf);
+ nfsd_file_put(nf);
+ }
+}
+
static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
.lm_break = nfsd4_layout_lm_break,
.lm_change = nfsd4_layout_lm_change,
+ .lm_breaker_timedout = nfsd_layout_breaker_timedout,
};
int
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-06 17:05 [Patch 0/2] " Dai Ngo
@ 2025-11-06 17:05 ` Dai Ngo
2025-11-07 13:26 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Dai Ngo @ 2025-11-06 17:05 UTC (permalink / raw)
To: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack
Cc: linux-fsdevel, linux-kernel, linux-nfs
Some consumers of the lease_manager_operations need to perform additional
actions when a lease break, triggered by a conflict, times out.
The NFS server is the first consumer of this operation.
When a pNFS layout conflict occurs, and the lease break times out -
resulting in the layout being revoked and its file_lease beeing removed
from the flc_lease list, the NFS server must issue a fence operation.
This ensures that the client is prevented from accessing the data
server after the layout is revoked.
Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
Documentation/filesystems/locking.rst | 2 ++
fs/locks.c | 14 +++++++++++---
include/linux/filelock.h | 2 ++
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 77704fde9845..cd600db6c4b9 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -403,6 +403,7 @@ prototypes::
bool (*lm_breaker_owns_lease)(struct file_lock *);
bool (*lm_lock_expirable)(struct file_lock *);
void (*lm_expire_lock)(void);
+ void (*lm_breaker_timedout)(struct file_lease *);
locking rules:
@@ -416,6 +417,7 @@ lm_change yes no no
lm_breaker_owns_lease: yes no no
lm_lock_expirable yes no no
lm_expire_lock no no yes
+lm_breaker_timedout no no yes
====================== ============= ================= =========
buffer_head
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..1f254e0cd398 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -369,9 +369,15 @@ locks_dispose_list(struct list_head *dispose)
while (!list_empty(dispose)) {
flc = list_first_entry(dispose, struct file_lock_core, flc_list);
list_del_init(&flc->flc_list);
- if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
+ if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
+ if (flc->flc_flags & FL_BREAKER_TIMEDOUT) {
+ struct file_lease *fl = file_lease(flc);
+
+ if (fl->fl_lmops->lm_breaker_timedout)
+ fl->fl_lmops->lm_breaker_timedout(fl);
+ }
locks_free_lease(file_lease(flc));
- else
+ } else
locks_free_lock(file_lock(flc));
}
}
@@ -1482,8 +1488,10 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
trace_time_out_leases(inode, fl);
if (past_time(fl->fl_downgrade_time))
lease_modify(fl, F_RDLCK, dispose);
- if (past_time(fl->fl_break_time))
+ if (past_time(fl->fl_break_time)) {
lease_modify(fl, F_UNLCK, dispose);
+ fl->c.flc_flags |= FL_BREAKER_TIMEDOUT;
+ }
}
}
diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index c2ce8ba05d06..06ccd6b66012 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -17,6 +17,7 @@
#define FL_OFDLCK 1024 /* lock is "owned" by struct file */
#define FL_LAYOUT 2048 /* outstanding pNFS layout */
#define FL_RECLAIM 4096 /* reclaiming from a reboot server */
+#define FL_BREAKER_TIMEDOUT 8192 /* lease breaker timed out */
#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
@@ -49,6 +50,7 @@ struct lease_manager_operations {
int (*lm_change)(struct file_lease *, int, struct list_head *);
void (*lm_setup)(struct file_lease *, void **);
bool (*lm_breaker_owns_lease)(struct file_lease *);
+ void (*lm_breaker_timedout)(struct file_lease *fl);
};
struct lock_manager {
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 16:47 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
@ 2025-11-06 17:14 ` Jeff Layton
2025-11-06 17:17 ` Dai Ngo
0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2025-11-06 17:14 UTC (permalink / raw)
To: Dai Ngo, chuck.lever, neilb, okorniev, tom, hch; +Cc: linux-nfs
On Thu, 2025-11-06 at 08:47 -0800, Dai Ngo wrote:
> When a layout conflict triggers a call to __break_lease, the function
> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> its loop, waiting indefinitely for the conflicting file lease to be
> released.
>
> If the number of lease conflicts matches the number of NFSD threads
> (which defaults to 8), all available NFSD threads become occupied.
> Consequently, there are no threads left to handle incoming requests
> or callback replies, leading to a total hang of the NFS server.
>
> This issue is reliably reproducible by running the Git test suite
> on a configuration using SCSI layout.
>
> This patch addresses the problem by using the break lease timeout
> and ensures that the unresponsive client is fenced, preventing it from
> accessing the data server directly.
>
> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
> fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> index 683bd1130afe..b9b1eb32624c 100644
> --- a/fs/nfsd/nfs4layouts.c
> +++ b/fs/nfsd/nfs4layouts.c
> @@ -747,11 +747,10 @@ static bool
> nfsd4_layout_lm_break(struct file_lease *fl)
> {
> /*
> - * We don't want the locks code to timeout the lease for us;
> - * we'll remove it ourself if a layout isn't returned
> - * in time:
> + * Enforce break lease timeout to prevent starvation of
> + * NFSD threads in __break_lease that causes server to
> + * hang.
> */
> - fl->fl_break_time = 0;
I guess this ends up with whatever the default fl_break_time is which
is:
jiffies + lease_break_time * HZ;
I wonder if this should be based around some multiple of the grace
period instead?
> nfsd4_recall_file_layout(fl->c.flc_owner);
> return false;
> }
> @@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
> return lease_modify(onlist, arg, dispose);
> }
>
> +static void nfsd_layout_breaker_timedout(struct file_lease *fl)
> +{
> + struct nfs4_layout_stateid *ls = fl->c.flc_owner;
> + struct nfsd_file *nf;
> +
> + rcu_read_lock();
> + nf = nfsd_file_get(ls->ls_file);
> + rcu_read_unlock();
> + if (nf) {
> + int type = ls->ls_layout_type;
> +
> + if (nfsd4_layout_ops[type]->fence_client)
> + nfsd4_layout_ops[type]->fence_client(ls, nf);
> + nfsd_file_put(nf);
> + }
> +}
> +
> static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
> .lm_break = nfsd4_layout_lm_break,
> .lm_change = nfsd4_layout_lm_change,
> + .lm_breaker_timedout = nfsd_layout_breaker_timedout,
> };
>
> int
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 17:14 ` Jeff Layton
@ 2025-11-06 17:17 ` Dai Ngo
2025-11-06 17:36 ` Jeff Layton
0 siblings, 1 reply; 12+ messages in thread
From: Dai Ngo @ 2025-11-06 17:17 UTC (permalink / raw)
To: Jeff Layton, chuck.lever, neilb, okorniev, tom, hch; +Cc: linux-nfs
On 11/6/25 9:14 AM, Jeff Layton wrote:
> On Thu, 2025-11-06 at 08:47 -0800, Dai Ngo wrote:
>> When a layout conflict triggers a call to __break_lease, the function
>> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
>> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
>> its loop, waiting indefinitely for the conflicting file lease to be
>> released.
>>
>> If the number of lease conflicts matches the number of NFSD threads
>> (which defaults to 8), all available NFSD threads become occupied.
>> Consequently, there are no threads left to handle incoming requests
>> or callback replies, leading to a total hang of the NFS server.
>>
>> This issue is reliably reproducible by running the Git test suite
>> on a configuration using SCSI layout.
>>
>> This patch addresses the problem by using the break lease timeout
>> and ensures that the unresponsive client is fenced, preventing it from
>> accessing the data server directly.
>>
>> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>> fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
>> 1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
>> index 683bd1130afe..b9b1eb32624c 100644
>> --- a/fs/nfsd/nfs4layouts.c
>> +++ b/fs/nfsd/nfs4layouts.c
>> @@ -747,11 +747,10 @@ static bool
>> nfsd4_layout_lm_break(struct file_lease *fl)
>> {
>> /*
>> - * We don't want the locks code to timeout the lease for us;
>> - * we'll remove it ourself if a layout isn't returned
>> - * in time:
>> + * Enforce break lease timeout to prevent starvation of
>> + * NFSD threads in __break_lease that causes server to
>> + * hang.
>> */
>> - fl->fl_break_time = 0;
> I guess this ends up with whatever the default fl_break_time is which
> is:
>
> jiffies + lease_break_time * HZ;
Yes, currently is 45 secs which is, I think, is way too long.
>
> I wonder if this should be based around some multiple of the grace
> period instead?
I think the time to allow for recall reply should be in milliseconds.
-Dai
>
>> nfsd4_recall_file_layout(fl->c.flc_owner);
>> return false;
>> }
>> @@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
>> return lease_modify(onlist, arg, dispose);
>> }
>>
>> +static void nfsd_layout_breaker_timedout(struct file_lease *fl)
>> +{
>> + struct nfs4_layout_stateid *ls = fl->c.flc_owner;
>> + struct nfsd_file *nf;
>> +
>> + rcu_read_lock();
>> + nf = nfsd_file_get(ls->ls_file);
>> + rcu_read_unlock();
>> + if (nf) {
>> + int type = ls->ls_layout_type;
>> +
>> + if (nfsd4_layout_ops[type]->fence_client)
>> + nfsd4_layout_ops[type]->fence_client(ls, nf);
>> + nfsd_file_put(nf);
>> + }
>> +}
>> +
>> static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
>> .lm_break = nfsd4_layout_lm_break,
>> .lm_change = nfsd4_layout_lm_change,
>> + .lm_breaker_timedout = nfsd_layout_breaker_timedout,
>> };
>>
>> int
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-06 16:47 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
@ 2025-11-06 17:23 ` Jeff Layton
2025-11-06 20:37 ` Dai Ngo
0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2025-11-06 17:23 UTC (permalink / raw)
To: Dai Ngo, chuck.lever, neilb, okorniev, tom, hch; +Cc: linux-nfs
On Thu, 2025-11-06 at 08:47 -0800, Dai Ngo wrote:
> Some consumers of the lease_manager_operations need to perform additional
> actions when a lease break, triggered by a conflict, times out.
>
> The NFS server is the first consumer of this operation.
>
> When a pNFS layout conflict occurs, and the lease break times out -
> resulting in the layout being revoked and its file_lease beeing removed
> from the flc_lease list, the NFS server must issue a fence operation.
> This ensures that the client is prevented from accessing the data
> server after the layout is revoked.
>
> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
> Documentation/filesystems/locking.rst | 2 ++
> fs/locks.c | 14 +++++++++++---
> include/linux/filelock.h | 2 ++
> 3 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 77704fde9845..cd600db6c4b9 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -403,6 +403,7 @@ prototypes::
> bool (*lm_breaker_owns_lease)(struct file_lock *);
> bool (*lm_lock_expirable)(struct file_lock *);
> void (*lm_expire_lock)(void);
> + void (*lm_breaker_timedout)(struct file_lease *);
>
> locking rules:
>
> @@ -416,6 +417,7 @@ lm_change yes no no
> lm_breaker_owns_lease: yes no no
> lm_lock_expirable yes no no
> lm_expire_lock no no yes
> +lm_breaker_timedout no no yes
> ====================== ============= ================= =========
>
> buffer_head
> diff --git a/fs/locks.c b/fs/locks.c
> index 04a3f0e20724..1f254e0cd398 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -369,9 +369,15 @@ locks_dispose_list(struct list_head *dispose)
> while (!list_empty(dispose)) {
> flc = list_first_entry(dispose, struct file_lock_core, flc_list);
> list_del_init(&flc->flc_list);
> - if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
> + if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
> + if (flc->flc_flags & FL_BREAKER_TIMEDOUT) {
> + struct file_lease *fl = file_lease(flc);
> +
> + if (fl->fl_lmops->lm_breaker_timedout)
> + fl->fl_lmops->lm_breaker_timedout(fl);
> + }
> locks_free_lease(file_lease(flc));
> - else
> + } else
> locks_free_lock(file_lock(flc));
> }
> }
I think this would be fine in the case initial task that calls
__break_lease(), since that task will also be the one to call
locks_dispose_list() for the lease (and hence will fence the client). I
think though that if you have multiple tasks that end up blocked in
__break_lease(), that the later tasks could end up proceeding before
the client is properly fenced.
We'll need to ensure that any other breaking tasks block until the
fence operations are complete.
> @@ -1482,8 +1488,10 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
> trace_time_out_leases(inode, fl);
> if (past_time(fl->fl_downgrade_time))
> lease_modify(fl, F_RDLCK, dispose);
> - if (past_time(fl->fl_break_time))
> + if (past_time(fl->fl_break_time)) {
> lease_modify(fl, F_UNLCK, dispose);
> + fl->c.flc_flags |= FL_BREAKER_TIMEDOUT;
> + }
> }
> }
>
> diff --git a/include/linux/filelock.h b/include/linux/filelock.h
> index c2ce8ba05d06..06ccd6b66012 100644
> --- a/include/linux/filelock.h
> +++ b/include/linux/filelock.h
> @@ -17,6 +17,7 @@
> #define FL_OFDLCK 1024 /* lock is "owned" by struct file */
> #define FL_LAYOUT 2048 /* outstanding pNFS layout */
> #define FL_RECLAIM 4096 /* reclaiming from a reboot server */
> +#define FL_BREAKER_TIMEDOUT 8192 /* lease breaker timed out */
>
> #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
>
> @@ -49,6 +50,7 @@ struct lease_manager_operations {
> int (*lm_change)(struct file_lease *, int, struct list_head *);
> void (*lm_setup)(struct file_lease *, void **);
> bool (*lm_breaker_owns_lease)(struct file_lease *);
> + void (*lm_breaker_timedout)(struct file_lease *fl);
> };
>
> struct lock_manager {
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 17:17 ` Dai Ngo
@ 2025-11-06 17:36 ` Jeff Layton
2025-11-06 17:50 ` Dai Ngo
0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2025-11-06 17:36 UTC (permalink / raw)
To: Dai Ngo, chuck.lever, neilb, okorniev, tom, hch; +Cc: linux-nfs
On Thu, 2025-11-06 at 09:17 -0800, Dai Ngo wrote:
> On 11/6/25 9:14 AM, Jeff Layton wrote:
> > On Thu, 2025-11-06 at 08:47 -0800, Dai Ngo wrote:
> > > When a layout conflict triggers a call to __break_lease, the function
> > > nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> > > the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> > > its loop, waiting indefinitely for the conflicting file lease to be
> > > released.
> > >
> > > If the number of lease conflicts matches the number of NFSD threads
> > > (which defaults to 8), all available NFSD threads become occupied.
> > > Consequently, there are no threads left to handle incoming requests
> > > or callback replies, leading to a total hang of the NFS server.
> > >
> > > This issue is reliably reproducible by running the Git test suite
> > > on a configuration using SCSI layout.
> > >
> > > This patch addresses the problem by using the break lease timeout
> > > and ensures that the unresponsive client is fenced, preventing it from
> > > accessing the data server directly.
> > >
> > > Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
> > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > ---
> > > fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
> > > 1 file changed, 21 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> > > index 683bd1130afe..b9b1eb32624c 100644
> > > --- a/fs/nfsd/nfs4layouts.c
> > > +++ b/fs/nfsd/nfs4layouts.c
> > > @@ -747,11 +747,10 @@ static bool
> > > nfsd4_layout_lm_break(struct file_lease *fl)
> > > {
> > > /*
> > > - * We don't want the locks code to timeout the lease for us;
> > > - * we'll remove it ourself if a layout isn't returned
> > > - * in time:
> > > + * Enforce break lease timeout to prevent starvation of
> > > + * NFSD threads in __break_lease that causes server to
> > > + * hang.
> > > */
> > > - fl->fl_break_time = 0;
> > I guess this ends up with whatever the default fl_break_time is which
> > is:
> >
> > jiffies + lease_break_time * HZ;
>
> Yes, currently is 45 secs which is, I think, is way too long.
>
> >
> > I wonder if this should be based around some multiple of the grace
> > period instead?
>
> I think the time to allow for recall reply should be in milliseconds.
>
> -Dai
>
I don't think that's at all reasonable. We'll be fencing slow machines
all over the place. Clients expect that they can be out of contact for
a little while (a lease period) and not lose their state. Fencing them
on a timeout substantially less than that will violate that
expectation.
> >
> > > nfsd4_recall_file_layout(fl->c.flc_owner);
> > > return false;
> > > }
> > > @@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
> > > return lease_modify(onlist, arg, dispose);
> > > }
> > >
> > > +static void nfsd_layout_breaker_timedout(struct file_lease *fl)
> > > +{
> > > + struct nfs4_layout_stateid *ls = fl->c.flc_owner;
> > > + struct nfsd_file *nf;
> > > +
> > > + rcu_read_lock();
> > > + nf = nfsd_file_get(ls->ls_file);
> > > + rcu_read_unlock();
> > > + if (nf) {
> > > + int type = ls->ls_layout_type;
> > > +
> > > + if (nfsd4_layout_ops[type]->fence_client)
> > > + nfsd4_layout_ops[type]->fence_client(ls, nf);
> > > + nfsd_file_put(nf);
> > > + }
> > > +}
> > > +
> > > static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
> > > .lm_break = nfsd4_layout_lm_break,
> > > .lm_change = nfsd4_layout_lm_change,
> > > + .lm_breaker_timedout = nfsd_layout_breaker_timedout,
> > > };
> > >
> > > int
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts
2025-11-06 17:36 ` Jeff Layton
@ 2025-11-06 17:50 ` Dai Ngo
0 siblings, 0 replies; 12+ messages in thread
From: Dai Ngo @ 2025-11-06 17:50 UTC (permalink / raw)
To: Jeff Layton, chuck.lever, neilb, okorniev, tom, hch; +Cc: linux-nfs
On 11/6/25 9:36 AM, Jeff Layton wrote:
> On Thu, 2025-11-06 at 09:17 -0800, Dai Ngo wrote:
>> On 11/6/25 9:14 AM, Jeff Layton wrote:
>>> On Thu, 2025-11-06 at 08:47 -0800, Dai Ngo wrote:
>>>> When a layout conflict triggers a call to __break_lease, the function
>>>> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
>>>> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
>>>> its loop, waiting indefinitely for the conflicting file lease to be
>>>> released.
>>>>
>>>> If the number of lease conflicts matches the number of NFSD threads
>>>> (which defaults to 8), all available NFSD threads become occupied.
>>>> Consequently, there are no threads left to handle incoming requests
>>>> or callback replies, leading to a total hang of the NFS server.
>>>>
>>>> This issue is reliably reproducible by running the Git test suite
>>>> on a configuration using SCSI layout.
>>>>
>>>> This patch addresses the problem by using the break lease timeout
>>>> and ensures that the unresponsive client is fenced, preventing it from
>>>> accessing the data server directly.
>>>>
>>>> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>> fs/nfsd/nfs4layouts.c | 25 +++++++++++++++++++++----
>>>> 1 file changed, 21 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
>>>> index 683bd1130afe..b9b1eb32624c 100644
>>>> --- a/fs/nfsd/nfs4layouts.c
>>>> +++ b/fs/nfsd/nfs4layouts.c
>>>> @@ -747,11 +747,10 @@ static bool
>>>> nfsd4_layout_lm_break(struct file_lease *fl)
>>>> {
>>>> /*
>>>> - * We don't want the locks code to timeout the lease for us;
>>>> - * we'll remove it ourself if a layout isn't returned
>>>> - * in time:
>>>> + * Enforce break lease timeout to prevent starvation of
>>>> + * NFSD threads in __break_lease that causes server to
>>>> + * hang.
>>>> */
>>>> - fl->fl_break_time = 0;
>>> I guess this ends up with whatever the default fl_break_time is which
>>> is:
>>>
>>> jiffies + lease_break_time * HZ;
>> Yes, currently is 45 secs which is, I think, is way too long.
>>
>>> I wonder if this should be based around some multiple of the grace
>>> period instead?
>> I think the time to allow for recall reply should be in milliseconds.
>>
>> -Dai
>>
> I don't think that's at all reasonable. We'll be fencing slow machines
> all over the place. Clients expect that they can be out of contact for
> a little while (a lease period) and not lose their state. Fencing them
> on a timeout substantially less than that will violate that
> expectation.
That is fine. But the way __break_lease is implemented now is that the
NFSD thread (limited resource) is tied up for a very long time to wait
or the client, which does not seem right.
Also, in __break_lease, if the first lease in the flc_lease list is not
the one that causes the conflict then its fl_break_time is 0 which causes
wait in wait_event_interruptible_timeout to be 1 sec. And that thread
just chews up CPU cycles in 1 second interval.
For long term, perhaps we should find other ways to implement __break_lease.
-Dai
>
>>>> nfsd4_recall_file_layout(fl->c.flc_owner);
>>>> return false;
>>>> }
>>>> @@ -764,9 +763,27 @@ nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
>>>> return lease_modify(onlist, arg, dispose);
>>>> }
>>>>
>>>> +static void nfsd_layout_breaker_timedout(struct file_lease *fl)
>>>> +{
>>>> + struct nfs4_layout_stateid *ls = fl->c.flc_owner;
>>>> + struct nfsd_file *nf;
>>>> +
>>>> + rcu_read_lock();
>>>> + nf = nfsd_file_get(ls->ls_file);
>>>> + rcu_read_unlock();
>>>> + if (nf) {
>>>> + int type = ls->ls_layout_type;
>>>> +
>>>> + if (nfsd4_layout_ops[type]->fence_client)
>>>> + nfsd4_layout_ops[type]->fence_client(ls, nf);
>>>> + nfsd_file_put(nf);
>>>> + }
>>>> +}
>>>> +
>>>> static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
>>>> .lm_break = nfsd4_layout_lm_break,
>>>> .lm_change = nfsd4_layout_lm_change,
>>>> + .lm_breaker_timedout = nfsd_layout_breaker_timedout,
>>>> };
>>>>
>>>> int
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-06 17:23 ` Jeff Layton
@ 2025-11-06 20:37 ` Dai Ngo
0 siblings, 0 replies; 12+ messages in thread
From: Dai Ngo @ 2025-11-06 20:37 UTC (permalink / raw)
To: Jeff Layton, chuck.lever, neilb, okorniev, tom, hch; +Cc: linux-nfs
On 11/6/25 9:23 AM, Jeff Layton wrote:
> On Thu, 2025-11-06 at 08:47 -0800, Dai Ngo wrote:
>> Some consumers of the lease_manager_operations need to perform additional
>> actions when a lease break, triggered by a conflict, times out.
>>
>> The NFS server is the first consumer of this operation.
>>
>> When a pNFS layout conflict occurs, and the lease break times out -
>> resulting in the layout being revoked and its file_lease beeing removed
>> from the flc_lease list, the NFS server must issue a fence operation.
>> This ensures that the client is prevented from accessing the data
>> server after the layout is revoked.
>>
>> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>> Documentation/filesystems/locking.rst | 2 ++
>> fs/locks.c | 14 +++++++++++---
>> include/linux/filelock.h | 2 ++
>> 3 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>> index 77704fde9845..cd600db6c4b9 100644
>> --- a/Documentation/filesystems/locking.rst
>> +++ b/Documentation/filesystems/locking.rst
>> @@ -403,6 +403,7 @@ prototypes::
>> bool (*lm_breaker_owns_lease)(struct file_lock *);
>> bool (*lm_lock_expirable)(struct file_lock *);
>> void (*lm_expire_lock)(void);
>> + void (*lm_breaker_timedout)(struct file_lease *);
>>
>> locking rules:
>>
>> @@ -416,6 +417,7 @@ lm_change yes no no
>> lm_breaker_owns_lease: yes no no
>> lm_lock_expirable yes no no
>> lm_expire_lock no no yes
>> +lm_breaker_timedout no no yes
>> ====================== ============= ================= =========
>>
>> buffer_head
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 04a3f0e20724..1f254e0cd398 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -369,9 +369,15 @@ locks_dispose_list(struct list_head *dispose)
>> while (!list_empty(dispose)) {
>> flc = list_first_entry(dispose, struct file_lock_core, flc_list);
>> list_del_init(&flc->flc_list);
>> - if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
>> + if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
>> + if (flc->flc_flags & FL_BREAKER_TIMEDOUT) {
>> + struct file_lease *fl = file_lease(flc);
>> +
>> + if (fl->fl_lmops->lm_breaker_timedout)
>> + fl->fl_lmops->lm_breaker_timedout(fl);
>> + }
>> locks_free_lease(file_lease(flc));
>> - else
>> + } else
>> locks_free_lock(file_lock(flc));
>> }
>> }
> I think this would be fine in the case initial task that calls
> __break_lease(), since that task will also be the one to call
> locks_dispose_list() for the lease (and hence will fence the client). I
> think though that if you have multiple tasks that end up blocked in
> __break_lease(), that the later tasks could end up proceeding before
> the client is properly fenced.
>
> We'll need to ensure that any other breaking tasks block until the
> fence operations are complete.
Thanks Jeff, this can be a problem. I'll post v2 patch.
-Dai
>
>
>> @@ -1482,8 +1488,10 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>> trace_time_out_leases(inode, fl);
>> if (past_time(fl->fl_downgrade_time))
>> lease_modify(fl, F_RDLCK, dispose);
>> - if (past_time(fl->fl_break_time))
>> + if (past_time(fl->fl_break_time)) {
>> lease_modify(fl, F_UNLCK, dispose);
>> + fl->c.flc_flags |= FL_BREAKER_TIMEDOUT;
>> + }
>> }
>> }
>>
>> diff --git a/include/linux/filelock.h b/include/linux/filelock.h
>> index c2ce8ba05d06..06ccd6b66012 100644
>> --- a/include/linux/filelock.h
>> +++ b/include/linux/filelock.h
>> @@ -17,6 +17,7 @@
>> #define FL_OFDLCK 1024 /* lock is "owned" by struct file */
>> #define FL_LAYOUT 2048 /* outstanding pNFS layout */
>> #define FL_RECLAIM 4096 /* reclaiming from a reboot server */
>> +#define FL_BREAKER_TIMEDOUT 8192 /* lease breaker timed out */
>>
>> #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
>>
>> @@ -49,6 +50,7 @@ struct lease_manager_operations {
>> int (*lm_change)(struct file_lease *, int, struct list_head *);
>> void (*lm_setup)(struct file_lease *, void **);
>> bool (*lm_breaker_owns_lease)(struct file_lease *);
>> + void (*lm_breaker_timedout)(struct file_lease *fl);
>> };
>>
>> struct lock_manager {
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
@ 2025-11-07 13:26 ` Christoph Hellwig
2025-11-07 16:58 ` Dai Ngo
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2025-11-07 13:26 UTC (permalink / raw)
To: Dai Ngo
Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On Thu, Nov 06, 2025 at 09:05:25AM -0800, Dai Ngo wrote:
> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
I don't think adding an operation can fix nfsd code. This is just
infrastructure you need for the fix that sits in the nfsd code.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations
2025-11-07 13:26 ` Christoph Hellwig
@ 2025-11-07 16:58 ` Dai Ngo
0 siblings, 0 replies; 12+ messages in thread
From: Dai Ngo @ 2025-11-07 16:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: chuck.lever, jlayton, neilb, okorniev, tom, alex.aring, viro,
brauner, jack, linux-fsdevel, linux-kernel, linux-nfs
On 11/7/25 5:26 AM, Christoph Hellwig wrote:
> On Thu, Nov 06, 2025 at 09:05:25AM -0800, Dai Ngo wrote:
>> Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
> I don't think adding an operation can fix nfsd code. This is just
> infrastructure you need for the fix that sits in the nfsd code.
The reason I added this tag is because patch 2/2 needs patch 1/2.
Since patch 2/2 has the fixes tag, could someone accidentally apply
it without patch 1/2?
-Dai
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-11-07 16:58 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06 16:47 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-06 16:47 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
2025-11-06 17:23 ` Jeff Layton
2025-11-06 20:37 ` Dai Ngo
2025-11-06 16:47 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-06 17:14 ` Jeff Layton
2025-11-06 17:17 ` Dai Ngo
2025-11-06 17:36 ` Jeff Layton
2025-11-06 17:50 ` Dai Ngo
-- strict thread matches above, loose matches on Subject: below --
2025-11-06 17:05 [Patch 0/2] " Dai Ngo
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
2025-11-07 13:26 ` Christoph Hellwig
2025-11-07 16:58 ` Dai Ngo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox