* [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface
@ 2025-05-28 0:12 Jeff Layton
2025-05-28 0:12 ` [PATCH 1/2] nfsd: use threads array as-is in netlink interface Jeff Layton
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Jeff Layton @ 2025-05-28 0:12 UTC (permalink / raw)
To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Mike Snitzer, linux-nfs, linux-kernel, linux-trace-kernel, netdev,
Jeff Layton
The first patch is probably appropriate for stable. It should fix
problems when someone sets the pool_mode to pernode, without userland
sending down a fully-populated thread array.
The second patch just adds a couple of new tracepoints that I ended up
using to track this down.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Jeff Layton (2):
nfsd: use threads array as-is in netlink interface
sunrpc: new tracepoints around svc thread wakeups
fs/nfsd/nfsctl.c | 5 ++---
include/trace/events/sunrpc.h | 23 ++++++++++++++++++-----
net/sunrpc/svc.c | 5 ++++-
3 files changed, 24 insertions(+), 9 deletions(-)
---
base-commit: 914873bc7df913db988284876c16257e6ab772c6
change-id: 20250527-rpc-numa-600daff3f36e
Best regards,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-05-28 0:12 [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface Jeff Layton
@ 2025-05-28 0:12 ` Jeff Layton
2025-05-28 17:07 ` Simon Horman
2025-06-12 15:57 ` Jeff Layton
2025-05-28 0:12 ` [PATCH 2/2] sunrpc: new tracepoints around svc thread wakeups Jeff Layton
2025-05-28 18:11 ` [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface cel
2 siblings, 2 replies; 19+ messages in thread
From: Jeff Layton @ 2025-05-28 0:12 UTC (permalink / raw)
To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Mike Snitzer, linux-nfs, linux-kernel, linux-trace-kernel, netdev,
Jeff Layton
The old nfsdfs interface for starting a server with multiple pools
handles the special case of a single entry array passed down from
userland by distributing the threads over every NUMA node.
The netlink control interface however constructs an array of length
nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
defeats the special casing that the old interface relies on.
Change nfsd_nl_threads_set_doit() to pass down the array from userland
as-is.
Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
Reported-by: Mike Snitzer <snitzer@kernel.org>
Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfsctl.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
*/
int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
{
- int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
+ int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
struct net *net = genl_info_net(info);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
const struct nlattr *attr;
@@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
/* count number of SERVER_THREADS values */
nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
if (nla_type(attr) == NFSD_A_SERVER_THREADS)
- count++;
+ nrpools++;
}
mutex_lock(&nfsd_mutex);
- nrpools = max(count, nfsd_nrpools(net));
nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
if (!nthreads) {
ret = -ENOMEM;
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/2] sunrpc: new tracepoints around svc thread wakeups
2025-05-28 0:12 [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface Jeff Layton
2025-05-28 0:12 ` [PATCH 1/2] nfsd: use threads array as-is in netlink interface Jeff Layton
@ 2025-05-28 0:12 ` Jeff Layton
2025-05-28 17:13 ` Simon Horman
2025-05-28 18:11 ` [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface cel
2 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2025-05-28 0:12 UTC (permalink / raw)
To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Mike Snitzer, linux-nfs, linux-kernel, linux-trace-kernel, netdev,
Jeff Layton
Convert the svc_wake_up tracepoint into svc_pool_thread_event class.
Have it also record the pool id, and add new tracepoints for when the
thread is already running and for when there are no idle threads.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
include/trace/events/sunrpc.h | 23 ++++++++++++++++++-----
net/sunrpc/svc.c | 5 ++++-
2 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 5d331383047b79b9f6dcd699c87287453c1a5f49..d23009b4dc979fd7eebcfb6bc3164608f74ab23b 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -2124,22 +2124,35 @@ TRACE_EVENT(svc_xprt_accept,
)
);
-TRACE_EVENT(svc_wake_up,
- TP_PROTO(int pid),
+DECLARE_EVENT_CLASS(svc_pool_thread_event,
+ TP_PROTO(const struct svc_pool *pool, pid_t pid),
- TP_ARGS(pid),
+ TP_ARGS(pool, pid),
TP_STRUCT__entry(
- __field(int, pid)
+ __field(unsigned int, pool_id)
+ __field(pid_t, pid)
),
TP_fast_assign(
+ __entry->pool_id = pool->sp_id;
__entry->pid = pid;
),
- TP_printk("pid=%d", __entry->pid)
+ TP_printk("pool=%u pid=%d", __entry->pool_id, __entry->pid)
);
+#define DEFINE_SVC_POOL_THREAD_EVENT(name) \
+ DEFINE_EVENT(svc_pool_thread_event, svc_pool_thread_##name, \
+ TP_PROTO( \
+ const struct svc_pool *pool, pid_t pid \
+ ), \
+ TP_ARGS(pool, pid))
+
+DEFINE_SVC_POOL_THREAD_EVENT(wake);
+DEFINE_SVC_POOL_THREAD_EVENT(running);
+DEFINE_SVC_POOL_THREAD_EVENT(noidle);
+
TRACE_EVENT(svc_alloc_arg_err,
TP_PROTO(
unsigned int requested,
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index e7f9c295d13c03bf28a5eeec839fd85e24f5525f..de80d3683350dc86bee3413719797dcf7a4562e8 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -749,13 +749,16 @@ void svc_pool_wake_idle_thread(struct svc_pool *pool)
WRITE_ONCE(rqstp->rq_qtime, ktime_get());
if (!task_is_running(rqstp->rq_task)) {
wake_up_process(rqstp->rq_task);
- trace_svc_wake_up(rqstp->rq_task->pid);
+ trace_svc_pool_thread_wake(pool, rqstp->rq_task->pid);
percpu_counter_inc(&pool->sp_threads_woken);
+ } else {
+ trace_svc_pool_thread_running(pool, rqstp->rq_task->pid);
}
rcu_read_unlock();
return;
}
rcu_read_unlock();
+ trace_svc_pool_thread_noidle(pool, rqstp->rq_task->pid);
}
EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread);
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-05-28 0:12 ` [PATCH 1/2] nfsd: use threads array as-is in netlink interface Jeff Layton
@ 2025-05-28 17:07 ` Simon Horman
2025-06-12 15:57 ` Jeff Layton
1 sibling, 0 replies; 19+ messages in thread
From: Simon Horman @ 2025-05-28 17:07 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Mike Snitzer, linux-nfs,
linux-kernel, linux-trace-kernel, netdev
On Tue, May 27, 2025 at 08:12:47PM -0400, Jeff Layton wrote:
> The old nfsdfs interface for starting a server with multiple pools
> handles the special case of a single entry array passed down from
> userland by distributing the threads over every NUMA node.
>
> The netlink control interface however constructs an array of length
> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
> defeats the special casing that the old interface relies on.
>
> Change nfsd_nl_threads_set_doit() to pass down the array from userland
> as-is.
>
> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
> Reported-by: Mike Snitzer <snitzer@kernel.org>
> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] sunrpc: new tracepoints around svc thread wakeups
2025-05-28 0:12 ` [PATCH 2/2] sunrpc: new tracepoints around svc thread wakeups Jeff Layton
@ 2025-05-28 17:13 ` Simon Horman
0 siblings, 0 replies; 19+ messages in thread
From: Simon Horman @ 2025-05-28 17:13 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Mike Snitzer, linux-nfs,
linux-kernel, linux-trace-kernel, netdev
On Tue, May 27, 2025 at 08:12:48PM -0400, Jeff Layton wrote:
> Convert the svc_wake_up tracepoint into svc_pool_thread_event class.
> Have it also record the pool id, and add new tracepoints for when the
> thread is already running and for when there are no idle threads.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
...
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index e7f9c295d13c03bf28a5eeec839fd85e24f5525f..de80d3683350dc86bee3413719797dcf7a4562e8 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -749,13 +749,16 @@ void svc_pool_wake_idle_thread(struct svc_pool *pool)
> WRITE_ONCE(rqstp->rq_qtime, ktime_get());
> if (!task_is_running(rqstp->rq_task)) {
> wake_up_process(rqstp->rq_task);
> - trace_svc_wake_up(rqstp->rq_task->pid);
> + trace_svc_pool_thread_wake(pool, rqstp->rq_task->pid);
> percpu_counter_inc(&pool->sp_threads_woken);
> + } else {
> + trace_svc_pool_thread_running(pool, rqstp->rq_task->pid);
> }
> rcu_read_unlock();
> return;
> }
> rcu_read_unlock();
Hi Jeff,
Above, rqstp is conditionally initialised if ln
(pool->sp_idle_threads.first) is not NULL.
But below it is dereferenced unconditionally.
This does not seem consistent.
Reported by clang-20.1.4 builds and Smatch.
> + trace_svc_pool_thread_noidle(pool, rqstp->rq_task->pid);
>
> }
> EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread);
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface
2025-05-28 0:12 [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface Jeff Layton
2025-05-28 0:12 ` [PATCH 1/2] nfsd: use threads array as-is in netlink interface Jeff Layton
2025-05-28 0:12 ` [PATCH 2/2] sunrpc: new tracepoints around svc thread wakeups Jeff Layton
@ 2025-05-28 18:11 ` cel
2025-05-28 18:22 ` Jeff Layton
2 siblings, 1 reply; 19+ messages in thread
From: cel @ 2025-05-28 18:11 UTC (permalink / raw)
To: Olga Kornievskaia, Dai Ngo, Tom Talpey, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Trond Myklebust,
Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, NeilBrown, Jeff Layton
Cc: Chuck Lever, Mike Snitzer, linux-nfs, linux-kernel,
linux-trace-kernel, netdev
From: Chuck Lever <chuck.lever@oracle.com>
On Tue, 27 May 2025 20:12:46 -0400, Jeff Layton wrote:
> The first patch is probably appropriate for stable. It should fix
> problems when someone sets the pool_mode to pernode, without userland
> sending down a fully-populated thread array.
>
> The second patch just adds a couple of new tracepoints that I ended up
> using to track this down.
>
> [...]
Applied to nfsd-testing, thanks!
[1/2] nfsd: use threads array as-is in netlink interface
commit: b2a9a114a3c7f5abfa2875b70ce9b73525a74291
[2/2] sunrpc: new tracepoints around svc thread wakeups
commit: 65b8babe551bddf00aac69bc905f88a4e0371766
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface
2025-05-28 18:11 ` [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface cel
@ 2025-05-28 18:22 ` Jeff Layton
2025-05-28 18:25 ` Chuck Lever
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2025-05-28 18:22 UTC (permalink / raw)
To: cel, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Trond Myklebust,
Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, NeilBrown
Cc: Chuck Lever, Mike Snitzer, linux-nfs, linux-kernel,
linux-trace-kernel, netdev
On Wed, 2025-05-28 at 14:11 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> On Tue, 27 May 2025 20:12:46 -0400, Jeff Layton wrote:
> > The first patch is probably appropriate for stable. It should fix
> > problems when someone sets the pool_mode to pernode, without userland
> > sending down a fully-populated thread array.
> >
> > The second patch just adds a couple of new tracepoints that I ended up
> > using to track this down.
> >
> > [...]
>
> Applied to nfsd-testing, thanks!
>
> [1/2] nfsd: use threads array as-is in netlink interface
> commit: b2a9a114a3c7f5abfa2875b70ce9b73525a74291
> [2/2] sunrpc: new tracepoints around svc thread wakeups
> commit: 65b8babe551bddf00aac69bc905f88a4e0371766
>
My apologies, Chuck. Patch #2 has a bug in it:
+ trace_svc_pool_thread_noidle(pool, rqstp->rq_task->pid);
In the call above, the rqstp will be undefined. That should be:
+ trace_svc_pool_thread_noidle(pool, 0);
You can fix that up in tree, or I can resend if you prefer.
Thanks!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface
2025-05-28 18:22 ` Jeff Layton
@ 2025-05-28 18:25 ` Chuck Lever
0 siblings, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2025-05-28 18:25 UTC (permalink / raw)
To: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, NeilBrown
Cc: Chuck Lever, Mike Snitzer, linux-nfs, linux-kernel,
linux-trace-kernel, netdev
On 5/28/25 2:22 PM, Jeff Layton wrote:
> On Wed, 2025-05-28 at 14:11 -0400, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> On Tue, 27 May 2025 20:12:46 -0400, Jeff Layton wrote:
>>> The first patch is probably appropriate for stable. It should fix
>>> problems when someone sets the pool_mode to pernode, without userland
>>> sending down a fully-populated thread array.
>>>
>>> The second patch just adds a couple of new tracepoints that I ended up
>>> using to track this down.
>>>
>>> [...]
>>
>> Applied to nfsd-testing, thanks!
>>
>> [1/2] nfsd: use threads array as-is in netlink interface
>> commit: b2a9a114a3c7f5abfa2875b70ce9b73525a74291
>> [2/2] sunrpc: new tracepoints around svc thread wakeups
>> commit: 65b8babe551bddf00aac69bc905f88a4e0371766
>>
>
> My apologies, Chuck. Patch #2 has a bug in it:
>
> + trace_svc_pool_thread_noidle(pool, rqstp->rq_task->pid);
>
> In the call above, the rqstp will be undefined. That should be:
>
> + trace_svc_pool_thread_noidle(pool, 0);
>
> You can fix that up in tree, or I can resend if you prefer.
Fixed.
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-05-28 0:12 ` [PATCH 1/2] nfsd: use threads array as-is in netlink interface Jeff Layton
2025-05-28 17:07 ` Simon Horman
@ 2025-06-12 15:57 ` Jeff Layton
2025-06-12 16:05 ` Chuck Lever
2025-06-13 18:57 ` Mike Snitzer
1 sibling, 2 replies; 19+ messages in thread
From: Jeff Layton @ 2025-06-12 15:57 UTC (permalink / raw)
To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Mike Snitzer, linux-nfs, linux-kernel, linux-trace-kernel, netdev
On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
> The old nfsdfs interface for starting a server with multiple pools
> handles the special case of a single entry array passed down from
> userland by distributing the threads over every NUMA node.
>
> The netlink control interface however constructs an array of length
> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
> defeats the special casing that the old interface relies on.
>
> Change nfsd_nl_threads_set_doit() to pass down the array from userland
> as-is.
>
> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
> Reported-by: Mike Snitzer <snitzer@kernel.org>
> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/nfsd/nfsctl.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> */
> int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> {
> - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
> + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
> struct net *net = genl_info_net(info);
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> const struct nlattr *attr;
> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> /* count number of SERVER_THREADS values */
> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> if (nla_type(attr) == NFSD_A_SERVER_THREADS)
> - count++;
> + nrpools++;
> }
>
> mutex_lock(&nfsd_mutex);
>
> - nrpools = max(count, nfsd_nrpools(net));
> nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
> if (!nthreads) {
> ret = -ENOMEM;
I noticed that this didn't go in to the recent merge window.
This patch fixes a rather nasty regression when you try to start the
server on a NUMA-capable box. It all looks like it works, but some RPCs
get silently dropped on the floor (if they happen to be received into a
node with no threads). It took me a while to track down the problem
after Mike reported it.
Can we go ahead and pull this in and send it to stable?
Also, did this patch fix the problem for you, Mike?
Thanks,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-12 15:57 ` Jeff Layton
@ 2025-06-12 16:05 ` Chuck Lever
2025-06-12 16:15 ` Jeff Layton
2025-06-13 18:57 ` Mike Snitzer
1 sibling, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2025-06-12 16:05 UTC (permalink / raw)
To: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Mike Snitzer, linux-nfs, linux-kernel, linux-trace-kernel, netdev
On 6/12/25 11:57 AM, Jeff Layton wrote:
> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>> The old nfsdfs interface for starting a server with multiple pools
>> handles the special case of a single entry array passed down from
>> userland by distributing the threads over every NUMA node.
>>
>> The netlink control interface however constructs an array of length
>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>> defeats the special casing that the old interface relies on.
>>
>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>> as-is.
>>
>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>> Reported-by: Mike Snitzer <snitzer@kernel.org>
>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> ---
>> fs/nfsd/nfsctl.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>> --- a/fs/nfsd/nfsctl.c
>> +++ b/fs/nfsd/nfsctl.c
>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>> */
>> int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>> {
>> - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>> + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>> struct net *net = genl_info_net(info);
>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>> const struct nlattr *attr;
>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>> /* count number of SERVER_THREADS values */
>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>> if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>> - count++;
>> + nrpools++;
>> }
>>
>> mutex_lock(&nfsd_mutex);
>>
>> - nrpools = max(count, nfsd_nrpools(net));
>> nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>> if (!nthreads) {
>> ret = -ENOMEM;
>
> I noticed that this didn't go in to the recent merge window.
>
> This patch fixes a rather nasty regression when you try to start the
> server on a NUMA-capable box.
The NFSD netlink interface is not broadly used yet, is it?
Since this one came in late during the 6.16 dev cycle and the Fixes: tag
references a commit that is already in released kernels, I put in the
"next merge window" pile. On it's own it doesn't look urgent to me.
> It all looks like it works, but some RPCs
> get silently dropped on the floor (if they happen to be received into a
> node with no threads). It took me a while to track down the problem
> after Mike reported it.
>
> Can we go ahead and pull this in and send it to stable?
>
> Also, did this patch fix the problem for you, Mike?
I'll wait for confirmation.
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-12 16:05 ` Chuck Lever
@ 2025-06-12 16:15 ` Jeff Layton
2025-06-12 16:42 ` Chuck Lever
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2025-06-12 16:15 UTC (permalink / raw)
To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Mike Snitzer, linux-nfs, linux-kernel, linux-trace-kernel, netdev
On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
> On 6/12/25 11:57 AM, Jeff Layton wrote:
> > On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
> > > The old nfsdfs interface for starting a server with multiple pools
> > > handles the special case of a single entry array passed down from
> > > userland by distributing the threads over every NUMA node.
> > >
> > > The netlink control interface however constructs an array of length
> > > nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
> > > defeats the special casing that the old interface relies on.
> > >
> > > Change nfsd_nl_threads_set_doit() to pass down the array from userland
> > > as-is.
> > >
> > > Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
> > > Reported-by: Mike Snitzer <snitzer@kernel.org>
> > > Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > fs/nfsd/nfsctl.c | 5 ++---
> > > 1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > > @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> > > */
> > > int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > {
> > > - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
> > > + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
> > > struct net *net = genl_info_net(info);
> > > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > const struct nlattr *attr;
> > > @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > /* count number of SERVER_THREADS values */
> > > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > > if (nla_type(attr) == NFSD_A_SERVER_THREADS)
> > > - count++;
> > > + nrpools++;
> > > }
> > >
> > > mutex_lock(&nfsd_mutex);
> > >
> > > - nrpools = max(count, nfsd_nrpools(net));
> > > nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
> > > if (!nthreads) {
> > > ret = -ENOMEM;
> >
> > I noticed that this didn't go in to the recent merge window.
> >
> > This patch fixes a rather nasty regression when you try to start the
> > server on a NUMA-capable box.
>
> The NFSD netlink interface is not broadly used yet, is it?
>
It is. RHEL10 shipped with it, for instance and it's been in Fedora for
a while.
> Since this one came in late during the 6.16 dev cycle and the Fixes: tag
> references a commit that is already in released kernels, I put in the
> "next merge window" pile. On it's own it doesn't look urgent to me.
>
I'd really like to see this go in soon and to stable. If you want me to
respin the changelog, I can. It's not a crash, but it manifests as lost
RPCs that just hang. It took me quite a while to figure out what was
going on, and I'd prefer that we not put users through that.
>
> > It all looks like it works, but some RPCs
> > get silently dropped on the floor (if they happen to be received into a
> > node with no threads). It took me a while to track down the problem
> > after Mike reported it.
> >
> > Can we go ahead and pull this in and send it to stable?
> >
> > Also, did this patch fix the problem for you, Mike?
>
> I'll wait for confirmation.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-12 16:15 ` Jeff Layton
@ 2025-06-12 16:42 ` Chuck Lever
2025-06-13 11:33 ` Benjamin Coddington
0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2025-06-12 16:42 UTC (permalink / raw)
To: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Mike Snitzer, linux-nfs, linux-kernel, linux-trace-kernel, netdev
On 6/12/25 12:15 PM, Jeff Layton wrote:
> On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
>> On 6/12/25 11:57 AM, Jeff Layton wrote:
>>> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>>>> The old nfsdfs interface for starting a server with multiple pools
>>>> handles the special case of a single entry array passed down from
>>>> userland by distributing the threads over every NUMA node.
>>>>
>>>> The netlink control interface however constructs an array of length
>>>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>>>> defeats the special casing that the old interface relies on.
>>>>
>>>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>>>> as-is.
>>>>
>>>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>>>> Reported-by: Mike Snitzer <snitzer@kernel.org>
>>>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>> fs/nfsd/nfsctl.c | 5 ++---
>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>>>> --- a/fs/nfsd/nfsctl.c
>>>> +++ b/fs/nfsd/nfsctl.c
>>>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>>>> */
>>>> int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> {
>>>> - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>>>> + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>>>> struct net *net = genl_info_net(info);
>>>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>> const struct nlattr *attr;
>>>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> /* count number of SERVER_THREADS values */
>>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>> if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>>>> - count++;
>>>> + nrpools++;
>>>> }
>>>>
>>>> mutex_lock(&nfsd_mutex);
>>>>
>>>> - nrpools = max(count, nfsd_nrpools(net));
>>>> nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>>>> if (!nthreads) {
>>>> ret = -ENOMEM;
>>>
>>> I noticed that this didn't go in to the recent merge window.
>>>
>>> This patch fixes a rather nasty regression when you try to start the
>>> server on a NUMA-capable box.
>>
>> The NFSD netlink interface is not broadly used yet, is it?
>>
>
> It is. RHEL10 shipped with it, for instance and it's been in Fedora for
> a while.
RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely
either of these are deployed in production environments yet. Just sayin
that in this case, the Bayesian filter leans towards waiting a full dev
cycle.
>> Since this one came in late during the 6.16 dev cycle and the Fixes: tag
>> references a commit that is already in released kernels, I put in the
>> "next merge window" pile. On it's own it doesn't look urgent to me.
>>
>
> I'd really like to see this go in soon and to stable. If you want me to
> respin the changelog, I can. It's not a crash, but it manifests as lost
> RPCs that just hang. It took me quite a while to figure out what was
> going on, and I'd prefer that we not put users through that.
If someone can confirm that it is effective, I'll add it to nfsd-fixes.
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-12 16:42 ` Chuck Lever
@ 2025-06-13 11:33 ` Benjamin Coddington
2025-06-13 14:56 ` Chuck Lever
0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Coddington @ 2025-06-13 11:33 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mike Snitzer,
linux-nfs, linux-kernel, linux-trace-kernel, netdev
On 12 Jun 2025, at 12:42, Chuck Lever wrote:
> On 6/12/25 12:15 PM, Jeff Layton wrote:
>> On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
>>> On 6/12/25 11:57 AM, Jeff Layton wrote:
>>>> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>>>>> The old nfsdfs interface for starting a server with multiple pools
>>>>> handles the special case of a single entry array passed down from
>>>>> userland by distributing the threads over every NUMA node.
>>>>>
>>>>> The netlink control interface however constructs an array of length
>>>>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>>>>> defeats the special casing that the old interface relies on.
>>>>>
>>>>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>>>>> as-is.
>>>>>
>>>>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>>>>> Reported-by: Mike Snitzer <snitzer@kernel.org>
>>>>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>> ---
>>>>> fs/nfsd/nfsctl.c | 5 ++---
>>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>>>>> --- a/fs/nfsd/nfsctl.c
>>>>> +++ b/fs/nfsd/nfsctl.c
>>>>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>>>>> */
>>>>> int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>> {
>>>>> - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>>>>> + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>>>>> struct net *net = genl_info_net(info);
>>>>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>>> const struct nlattr *attr;
>>>>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>> /* count number of SERVER_THREADS values */
>>>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>>> if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>>>>> - count++;
>>>>> + nrpools++;
>>>>> }
>>>>>
>>>>> mutex_lock(&nfsd_mutex);
>>>>>
>>>>> - nrpools = max(count, nfsd_nrpools(net));
>>>>> nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>>>>> if (!nthreads) {
>>>>> ret = -ENOMEM;
>>>>
>>>> I noticed that this didn't go in to the recent merge window.
>>>>
>>>> This patch fixes a rather nasty regression when you try to start the
>>>> server on a NUMA-capable box.
>>>
>>> The NFSD netlink interface is not broadly used yet, is it?
>>>
>>
>> It is. RHEL10 shipped with it, for instance and it's been in Fedora for
>> a while.
>
> RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely
> either of these are deployed in production environments yet. Just sayin
> that in this case, the Bayesian filter leans towards waiting a full dev
> cycle.
We don't consider it acceptable to allow known defects to persist in our
products just because they are bleeding edge.
>>> Since this one came in late during the 6.16 dev cycle and the Fixes: tag
>>> references a commit that is already in released kernels, I put in the
>>> "next merge window" pile. On it's own it doesn't look urgent to me.
>>>
>>
>> I'd really like to see this go in soon and to stable. If you want me to
>> respin the changelog, I can. It's not a crash, but it manifests as lost
>> RPCs that just hang. It took me quite a while to figure out what was
>> going on, and I'd prefer that we not put users through that.
>
> If someone can confirm that it is effective, I'll add it to nfsd-fixes.
I'm sure it is if Jeff spent time on it.
We're going to try to get this into RHEL-10 ASAP, because dropped RPCs
manifest as datacenter-wide problems that are very hard to diagnose. Its a
real pain that we won't have an upstream commit assigned for it.
Ben
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-13 11:33 ` Benjamin Coddington
@ 2025-06-13 14:56 ` Chuck Lever
2025-06-13 15:23 ` Benjamin Coddington
2025-06-13 21:05 ` Jeff Layton
0 siblings, 2 replies; 19+ messages in thread
From: Chuck Lever @ 2025-06-13 14:56 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mike Snitzer,
linux-nfs, linux-kernel, linux-trace-kernel, netdev
On 6/13/25 7:33 AM, Benjamin Coddington wrote:
> On 12 Jun 2025, at 12:42, Chuck Lever wrote:
>
>> On 6/12/25 12:15 PM, Jeff Layton wrote:
>>> On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
>>>> On 6/12/25 11:57 AM, Jeff Layton wrote:
>>>>> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>>>>>> The old nfsdfs interface for starting a server with multiple pools
>>>>>> handles the special case of a single entry array passed down from
>>>>>> userland by distributing the threads over every NUMA node.
>>>>>>
>>>>>> The netlink control interface however constructs an array of length
>>>>>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>>>>>> defeats the special casing that the old interface relies on.
>>>>>>
>>>>>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>>>>>> as-is.
>>>>>>
>>>>>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>>>>>> Reported-by: Mike Snitzer <snitzer@kernel.org>
>>>>>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>> ---
>>>>>> fs/nfsd/nfsctl.c | 5 ++---
>>>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>>>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>>>>>> --- a/fs/nfsd/nfsctl.c
>>>>>> +++ b/fs/nfsd/nfsctl.c
>>>>>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>>>>>> */
>>>>>> int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>>> {
>>>>>> - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>>>>>> + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>>>>>> struct net *net = genl_info_net(info);
>>>>>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>>>> const struct nlattr *attr;
>>>>>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>>> /* count number of SERVER_THREADS values */
>>>>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>>>> if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>>>>>> - count++;
>>>>>> + nrpools++;
>>>>>> }
>>>>>>
>>>>>> mutex_lock(&nfsd_mutex);
>>>>>>
>>>>>> - nrpools = max(count, nfsd_nrpools(net));
>>>>>> nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>>>>>> if (!nthreads) {
>>>>>> ret = -ENOMEM;
>>>>>
>>>>> I noticed that this didn't go in to the recent merge window.
>>>>>
>>>>> This patch fixes a rather nasty regression when you try to start the
>>>>> server on a NUMA-capable box.
>>>>
>>>> The NFSD netlink interface is not broadly used yet, is it?
>>>>
>>>
>>> It is. RHEL10 shipped with it, for instance and it's been in Fedora for
>>> a while.
>>
>> RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely
>> either of these are deployed in production environments yet. Just sayin
>> that in this case, the Bayesian filter leans towards waiting a full dev
>> cycle.
>
> We don't consider it acceptable to allow known defects to persist in our
> products just because they are bleeding edge.
I'm not letting this issue persist. Proper testing takes time.
The patch description and discussion around this change did not include
any information about its pervasiveness and only a little about its
severity. I used my best judgement and followed my usual rules, which
are:
1. Crashers, data corrupters, and security bugs with public bug reports
and confirmed fix effectiveness go in as quickly as we can test.
Note well that we have to balance the risk of introducing regressions
in this case, since going in quickly means the fix lacks significant
test experience.
1a. Rashes and bug bites require application of topical hydrocortisone.
2. Patches sit in nfsd-testing for at least two weeks; better if they
are there for four. I have CI running daily on that branch, and
sometimes it takes a while for a problem to surface and be noticed.
3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
as it takes for them to matriculate into linux-next and fs-next.
4. If the patch fixes an issue that was introduced in the most recent
merge window, it goes in -fixes .
5. If the patch fixes an issue that is already in released kernels
(and we are at rule 5 because the patch does not fix an immediate
issue) then it goes in -next .
These evidence-oriented guidelines are in place to ensure that we don't
panic and rush commits into the kernel without careful review and
testing. There have been plenty of times when a fix that was pushed
urgently was not complete or even made things worse. It's a long
pipeline on purpose.
The issues with this patch were:
- It was posted very late in the dev cycle for v6.16. (Jeff's urgent
fixes always seem to happen during -rc7 ;-)
- The Fixes: tag refers to a commit that was several releases ago, and
I am not aware of specific reports of anyone hitting a similar issue.
- IME, the adoption of enterprise distributions is slow. RHEL 10 is
still only on its GA release. Therefore my estimation is that the
number of potentially impacted customers will be small for some time,
enough time for us to test Jeff's fix appropriately.
- The issue did not appear to me to be severe, but maybe I didn't read
the patch description carefully enough.
- Although I respect, admire, and greatly appreciate the effort Jeff
made to nail this one, that does not mean it is a pervasive problem.
Jeff is quite capable of applying his own work to the kernels he and
his employer care about.
>>>> Since this one came in late during the 6.16 dev cycle and the Fixes: tag
>>>> references a commit that is already in released kernels, I put in the
>>>> "next merge window" pile. On it's own it doesn't look urgent to me.
>>>>
>>>
>>> I'd really like to see this go in soon and to stable. If you want me to
>>> respin the changelog, I can. It's not a crash, but it manifests as lost
>>> RPCs that just hang. It took me quite a while to figure out what was
>>> going on, and I'd prefer that we not put users through that.
>>
>> If someone can confirm that it is effective, I'll add it to nfsd-fixes.
>
> I'm sure it is if Jeff spent time on it.
>
> We're going to try to get this into RHEL-10 ASAP, because dropped RPCs
> manifest as datacenter-wide problems that are very hard to diagnose.
It sounds like Red Hat also does not have clear evidence that links this
patch to a specific failure experienced by your customers. This affirms
my understanding that this fix is defensive rather than urgent.
As a rule, defensive fixes go in during merge windows.
> Its a
> real pain that we won't have an upstream commit assigned for it.
It's not reasonable for any upstream maintainer not employed by Red Hat
to know about or cleave to Red Hat's internal processes. But, if an
issue is on Red Hat's radar, then you are welcome to make its priority
known to me so I can schedule fixes appropriately.
All that said, I've promoted the fix to nfsd-fixes, since it's narrow
and has several weeks of test experience now.
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-13 14:56 ` Chuck Lever
@ 2025-06-13 15:23 ` Benjamin Coddington
2025-06-13 15:38 ` Chuck Lever
2025-06-13 21:05 ` Jeff Layton
1 sibling, 1 reply; 19+ messages in thread
From: Benjamin Coddington @ 2025-06-13 15:23 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mike Snitzer,
linux-nfs, linux-kernel, linux-trace-kernel, netdev
On 13 Jun 2025, at 10:56, Chuck Lever wrote:
> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
>> We don't consider it acceptable to allow known defects to persist in our
>> products just because they are bleeding edge.
>
> I'm not letting this issue persist. Proper testing takes time.
>
> The patch description and discussion around this change did not include
> any information about its pervasiveness and only a little about its
> severity. I used my best judgement and followed my usual rules, which
> are:
>
> 1. Crashers, data corrupters, and security bugs with public bug reports
> and confirmed fix effectiveness go in as quickly as we can test.
> Note well that we have to balance the risk of introducing regressions
> in this case, since going in quickly means the fix lacks significant
> test experience.
>
> 1a. Rashes and bug bites require application of topical hydrocortisone.
:) no rash here, this response is very soothing.
> 2. Patches sit in nfsd-testing for at least two weeks; better if they
> are there for four. I have CI running daily on that branch, and
> sometimes it takes a while for a problem to surface and be noticed.
>
> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
> as it takes for them to matriculate into linux-next and fs-next.
>
> 4. If the patch fixes an issue that was introduced in the most recent
> merge window, it goes in -fixes .
>
> 5. If the patch fixes an issue that is already in released kernels
> (and we are at rule 5 because the patch does not fix an immediate
> issue) then it goes in -next .
>
> These evidence-oriented guidelines are in place to ensure that we don't
> panic and rush commits into the kernel without careful review and
> testing. There have been plenty of times when a fix that was pushed
> urgently was not complete or even made things worse. It's a long
> pipeline on purpose.
I totally understand, thanks very much for having a set of rules and
guidelines and even more for taking the time to spell them out here.
I wanted to express that Red Hat does consider all of its releases to be
important to fix and maintain. I'd like to speak against arguments about fix
urgency based on distro versions. I think in this case we innocently crept
into these arguments as Jeff presented evidence that the problem exists in
the wild.
> The issues with this patch were:
>
> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
> fixes always seem to happen during -rc7 ;-)
>
> - The Fixes: tag refers to a commit that was several releases ago, and
> I am not aware of specific reports of anyone hitting a similar issue.
>
> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
> still only on its GA release. Therefore my estimation is that the
> number of potentially impacted customers will be small for some time,
> enough time for us to test Jeff's fix appropriately.
While this is true, I hope we can still treat every release version equally
/if/ we make any arguments about urgency based on what's currently released
in a particular distro. Your point is a good counter-arguement to Jeff's
assertion that the problem has been widely distributed - but it does start
to creep into a space which feels like we're treating certain early versions
of a specific distro differently and didn't sit well for me. I'd rather not
have our upstream work or decisions appear to favor a particular distro.
> - The issue did not appear to me to be severe, but maybe I didn't read
> the patch description carefully enough.
>
> - Although I respect, admire, and greatly appreciate the effort Jeff
> made to nail this one, that does not mean it is a pervasive problem.
> Jeff is quite capable of applying his own work to the kernels he and
> his employer care about.
>
<snip>
>
> It sounds like Red Hat also does not have clear evidence that links this
> patch to a specific failure experienced by your customers. This affirms
> my understanding that this fix is defensive rather than urgent.
Also true - not yet, but there's a significant lag between customers
discovering a problem and our engineers knowing about it, and during that
lag all sorts of time, money, and reputation points are lost.
> As a rule, defensive fixes go in during merge windows.
>
>> Its a real pain that we won't have an upstream commit assigned for it.
>
> It's not reasonable for any upstream maintainer not employed by Red Hat
> to know about or cleave to Red Hat's internal processes. But, if an
> issue is on Red Hat's radar, then you are welcome to make its priority
> known to me so I can schedule fixes appropriately.
Thanks! I realize that, which is why I spoke up.
> All that said, I've promoted the fix to nfsd-fixes, since it's narrow
> and has several weeks of test experience now.
Again, thanks! We greatly appreciate the work you're doing.
Best,
Ben
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-13 15:23 ` Benjamin Coddington
@ 2025-06-13 15:38 ` Chuck Lever
0 siblings, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2025-06-13 15:38 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mike Snitzer,
linux-nfs, linux-kernel, linux-trace-kernel, netdev
On 6/13/25 11:23 AM, Benjamin Coddington wrote:
> On 13 Jun 2025, at 10:56, Chuck Lever wrote:
>
>> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
>>> We don't consider it acceptable to allow known defects to persist in our
>>> products just because they are bleeding edge.
>>
>> I'm not letting this issue persist. Proper testing takes time.
>>
>> The patch description and discussion around this change did not include
>> any information about its pervasiveness and only a little about its
>> severity. I used my best judgement and followed my usual rules, which
>> are:
>>
>> 1. Crashers, data corrupters, and security bugs with public bug reports
>> and confirmed fix effectiveness go in as quickly as we can test.
>> Note well that we have to balance the risk of introducing regressions
>> in this case, since going in quickly means the fix lacks significant
>> test experience.
>>
>> 1a. Rashes and bug bites require application of topical hydrocortisone.
>
> :) no rash here, this response is very soothing.
>
>> 2. Patches sit in nfsd-testing for at least two weeks; better if they
>> are there for four. I have CI running daily on that branch, and
>> sometimes it takes a while for a problem to surface and be noticed.
>>
>> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
>> as it takes for them to matriculate into linux-next and fs-next.
>>
>> 4. If the patch fixes an issue that was introduced in the most recent
>> merge window, it goes in -fixes .
>>
>> 5. If the patch fixes an issue that is already in released kernels
>> (and we are at rule 5 because the patch does not fix an immediate
>> issue) then it goes in -next .
>>
>> These evidence-oriented guidelines are in place to ensure that we don't
>> panic and rush commits into the kernel without careful review and
>> testing. There have been plenty of times when a fix that was pushed
>> urgently was not complete or even made things worse. It's a long
>> pipeline on purpose.
>
> I totally understand, thanks very much for having a set of rules and
> guidelines and even more for taking the time to spell them out here.
Apologies for the length. I wanted to get these out in the open just
so you and others can slap me with a clue bat if I'm doing something
vastly strange or inappropriate.
> I wanted to express that Red Hat does consider all of its releases to be
> important to fix and maintain. I'd like to speak against arguments about fix
> urgency based on distro versions. I think in this case we innocently crept
> into these arguments as Jeff presented evidence that the problem exists in
> the wild.
I was estimating pervasiveness based on the position of the RHEL 10
distro in its life cycle, nothing more.
>> The issues with this patch were:
>>
>> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
>> fixes always seem to happen during -rc7 ;-)
>>
>> - The Fixes: tag refers to a commit that was several releases ago, and
>> I am not aware of specific reports of anyone hitting a similar issue.
>>
>> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
>> still only on its GA release. Therefore my estimation is that the
>> number of potentially impacted customers will be small for some time,
>> enough time for us to test Jeff's fix appropriately.
>
> While this is true, I hope we can still treat every release version equally
> /if/ we make any arguments about urgency based on what's currently released
> in a particular distro. Your point is a good counter-arguement to Jeff's
> assertion that the problem has been widely distributed - but it does start
> to creep into a space which feels like we're treating certain early versions
> of a specific distro differently and didn't sit well for me. I'd rather not
> have our upstream work or decisions appear to favor a particular distro.
Understood. I hope I convinced you that I was merely making an evidence-
based estimation about the pervasiveness of any problem this patch might
have been attempting to address.
The shorthand term "bleeding edge" was not intended to be disrespectful,
only descriptive.
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-12 15:57 ` Jeff Layton
2025-06-12 16:05 ` Chuck Lever
@ 2025-06-13 18:57 ` Mike Snitzer
2025-06-13 19:00 ` Chuck Lever
1 sibling, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2025-06-13 18:57 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, linux-nfs,
linux-kernel, linux-trace-kernel, netdev
On Thu, Jun 12, 2025 at 11:57:59AM -0400, Jeff Layton wrote:
> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
> > The old nfsdfs interface for starting a server with multiple pools
> > handles the special case of a single entry array passed down from
> > userland by distributing the threads over every NUMA node.
> >
> > The netlink control interface however constructs an array of length
> > nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
> > defeats the special casing that the old interface relies on.
> >
> > Change nfsd_nl_threads_set_doit() to pass down the array from userland
> > as-is.
> >
> > Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
> > Reported-by: Mike Snitzer <snitzer@kernel.org>
> > Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/nfsd/nfsctl.c | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> > */
> > int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > {
> > - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
> > + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
> > struct net *net = genl_info_net(info);
> > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > const struct nlattr *attr;
> > @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > /* count number of SERVER_THREADS values */
> > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > if (nla_type(attr) == NFSD_A_SERVER_THREADS)
> > - count++;
> > + nrpools++;
> > }
> >
> > mutex_lock(&nfsd_mutex);
> >
> > - nrpools = max(count, nfsd_nrpools(net));
> > nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
> > if (!nthreads) {
> > ret = -ENOMEM;
>
> I noticed that this didn't go in to the recent merge window.
>
> This patch fixes a rather nasty regression when you try to start the
> server on a NUMA-capable box. It all looks like it works, but some RPCs
> get silently dropped on the floor (if they happen to be received into a
> node with no threads). It took me a while to track down the problem
> after Mike reported it.
>
> Can we go ahead and pull this in and send it to stable?
>
> Also, did this patch fix the problem for you, Mike?
Hi Jeff,
I saw your other mail asking the same, figured it best to reply to this
thread with the patch.
YES, I just verified this patch fixes the issue I reported. I didn't
think I was critical path for confirming the fix, and since I had
worked around it (by downgrading nfs-utils from EL10's 2.8.2 to EL9's
2.5.4 it wasn't a super quick thing for me to test.. it became
out-of-sight-out-of-mind...
BTW, Chuck, I think the reason there aren't many/any reports (even
with RHEL10 or Fedora users) is that the user needs to:
1) have a NUMA system
2) explicitly change sunrpc's default for pool_mode from global to pernode.
Anyway:
Tested-by: Mike Snitzer <snitzer@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-13 18:57 ` Mike Snitzer
@ 2025-06-13 19:00 ` Chuck Lever
0 siblings, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2025-06-13 19:00 UTC (permalink / raw)
To: Mike Snitzer, Jeff Layton
Cc: Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, linux-nfs,
linux-kernel, linux-trace-kernel, netdev
On 6/13/25 2:57 PM, Mike Snitzer wrote:
> On Thu, Jun 12, 2025 at 11:57:59AM -0400, Jeff Layton wrote:
>> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>>> The old nfsdfs interface for starting a server with multiple pools
>>> handles the special case of a single entry array passed down from
>>> userland by distributing the threads over every NUMA node.
>>>
>>> The netlink control interface however constructs an array of length
>>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>>> defeats the special casing that the old interface relies on.
>>>
>>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>>> as-is.
>>>
>>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>>> Reported-by: Mike Snitzer <snitzer@kernel.org>
>>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>> fs/nfsd/nfsctl.c | 5 ++---
>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>>> --- a/fs/nfsd/nfsctl.c
>>> +++ b/fs/nfsd/nfsctl.c
>>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>>> */
>>> int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>> {
>>> - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>>> + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>>> struct net *net = genl_info_net(info);
>>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>> const struct nlattr *attr;
>>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>> /* count number of SERVER_THREADS values */
>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>> if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>>> - count++;
>>> + nrpools++;
>>> }
>>>
>>> mutex_lock(&nfsd_mutex);
>>>
>>> - nrpools = max(count, nfsd_nrpools(net));
>>> nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>>> if (!nthreads) {
>>> ret = -ENOMEM;
>>
>> I noticed that this didn't go in to the recent merge window.
>>
>> This patch fixes a rather nasty regression when you try to start the
>> server on a NUMA-capable box. It all looks like it works, but some RPCs
>> get silently dropped on the floor (if they happen to be received into a
>> node with no threads). It took me a while to track down the problem
>> after Mike reported it.
>>
>> Can we go ahead and pull this in and send it to stable?
>>
>> Also, did this patch fix the problem for you, Mike?
>
> Hi Jeff,
>
> I saw your other mail asking the same, figured it best to reply to this
> thread with the patch.
>
> YES, I just verified this patch fixes the issue I reported. I didn't
> think I was critical path for confirming the fix, and since I had
> worked around it (by downgrading nfs-utils from EL10's 2.8.2 to EL9's
> 2.5.4 it wasn't a super quick thing for me to test.. it became
> out-of-sight-out-of-mind...
>
> BTW, Chuck, I think the reason there aren't many/any reports (even
> with RHEL10 or Fedora users) is that the user needs to:
> 1) have a NUMA system
> 2) explicitly change sunrpc's default for pool_mode from global to pernode.
Not a very common thing to do, IME.
> Anyway:
>
> Tested-by: Mike Snitzer <snitzer@kernel.org>
Tag applied, thanks.
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
2025-06-13 14:56 ` Chuck Lever
2025-06-13 15:23 ` Benjamin Coddington
@ 2025-06-13 21:05 ` Jeff Layton
1 sibling, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2025-06-13 21:05 UTC (permalink / raw)
To: Chuck Lever, Benjamin Coddington
Cc: Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mike Snitzer,
linux-nfs, linux-kernel, linux-trace-kernel, netdev
On Fri, 2025-06-13 at 10:56 -0400, Chuck Lever wrote:
> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
> > On 12 Jun 2025, at 12:42, Chuck Lever wrote:
> >
> > > On 6/12/25 12:15 PM, Jeff Layton wrote:
> > > > On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
> > > > > On 6/12/25 11:57 AM, Jeff Layton wrote:
> > > > > > On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
> > > > > > > The old nfsdfs interface for starting a server with multiple pools
> > > > > > > handles the special case of a single entry array passed down from
> > > > > > > userland by distributing the threads over every NUMA node.
> > > > > > >
> > > > > > > The netlink control interface however constructs an array of length
> > > > > > > nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
> > > > > > > defeats the special casing that the old interface relies on.
> > > > > > >
> > > > > > > Change nfsd_nl_threads_set_doit() to pass down the array from userland
> > > > > > > as-is.
> > > > > > >
> > > > > > > Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
> > > > > > > Reported-by: Mike Snitzer <snitzer@kernel.org>
> > > > > > > Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > ---
> > > > > > > fs/nfsd/nfsctl.c | 5 ++---
> > > > > > > 1 file changed, 2 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > > > > > index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
> > > > > > > --- a/fs/nfsd/nfsctl.c
> > > > > > > +++ b/fs/nfsd/nfsctl.c
> > > > > > > @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> > > > > > > */
> > > > > > > int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > > > > > {
> > > > > > > - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
> > > > > > > + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
> > > > > > > struct net *net = genl_info_net(info);
> > > > > > > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > > > > > const struct nlattr *attr;
> > > > > > > @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > > > > > /* count number of SERVER_THREADS values */
> > > > > > > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > > > > > > if (nla_type(attr) == NFSD_A_SERVER_THREADS)
> > > > > > > - count++;
> > > > > > > + nrpools++;
> > > > > > > }
> > > > > > >
> > > > > > > mutex_lock(&nfsd_mutex);
> > > > > > >
> > > > > > > - nrpools = max(count, nfsd_nrpools(net));
> > > > > > > nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
> > > > > > > if (!nthreads) {
> > > > > > > ret = -ENOMEM;
> > > > > >
> > > > > > I noticed that this didn't go in to the recent merge window.
> > > > > >
> > > > > > This patch fixes a rather nasty regression when you try to start the
> > > > > > server on a NUMA-capable box.
> > > > >
> > > > > The NFSD netlink interface is not broadly used yet, is it?
> > > > >
> > > >
> > > > It is. RHEL10 shipped with it, for instance and it's been in Fedora for
> > > > a while.
> > >
> > > RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely
> > > either of these are deployed in production environments yet. Just sayin
> > > that in this case, the Bayesian filter leans towards waiting a full dev
> > > cycle.
> >
> > We don't consider it acceptable to allow known defects to persist in our
> > products just because they are bleeding edge.
>
> I'm not letting this issue persist. Proper testing takes time.
>
> The patch description and discussion around this change did not include
> any information about its pervasiveness and only a little about its
> severity. I used my best judgement and followed my usual rules, which
> are:
>
> 1. Crashers, data corrupters, and security bugs with public bug reports
> and confirmed fix effectiveness go in as quickly as we can test.
> Note well that we have to balance the risk of introducing regressions
> in this case, since going in quickly means the fix lacks significant
> test experience.
>
> 1a. Rashes and bug bites require application of topical hydrocortisone.
>
> 2. Patches sit in nfsd-testing for at least two weeks; better if they
> are there for four. I have CI running daily on that branch, and
> sometimes it takes a while for a problem to surface and be noticed.
>
> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
> as it takes for them to matriculate into linux-next and fs-next.
>
> 4. If the patch fixes an issue that was introduced in the most recent
> merge window, it goes in -fixes .
>
> 5. If the patch fixes an issue that is already in released kernels
> (and we are at rule 5 because the patch does not fix an immediate
> issue) then it goes in -next .
>
> These evidence-oriented guidelines are in place to ensure that we don't
> panic and rush commits into the kernel without careful review and
> testing. There have been plenty of times when a fix that was pushed
> urgently was not complete or even made things worse. It's a long
> pipeline on purpose.
>
> The issues with this patch were:
>
> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
> fixes always seem to happen during -rc7 ;-)
>
> - The Fixes: tag refers to a commit that was several releases ago, and
> I am not aware of specific reports of anyone hitting a similar issue.
>
> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
> still only on its GA release. Therefore my estimation is that the
> number of potentially impacted customers will be small for some time,
> enough time for us to test Jeff's fix appropriately.
>
> - The issue did not appear to me to be severe, but maybe I didn't read
> the patch description carefully enough.
>
> - Although I respect, admire, and greatly appreciate the effort Jeff
> made to nail this one, that does not mean it is a pervasive problem.
> Jeff is quite capable of applying his own work to the kernels he and
> his employer care about.
>
The rules all make sense to me. In this case though, I felt the
potential harm from not taking this patch outweighed the risk. NUMA
hardware is more prevalent than you might think.
>
> > > > > Since this one came in late during the 6.16 dev cycle and the Fixes: tag
> > > > > references a commit that is already in released kernels, I put in the
> > > > > "next merge window" pile. On it's own it doesn't look urgent to me.
> > > > >
> > > >
> > > > I'd really like to see this go in soon and to stable. If you want me to
> > > > respin the changelog, I can. It's not a crash, but it manifests as lost
> > > > RPCs that just hang. It took me quite a while to figure out what was
> > > > going on, and I'd prefer that we not put users through that.
> > >
> > > If someone can confirm that it is effective, I'll add it to nfsd-fixes.
> >
> > I'm sure it is if Jeff spent time on it.
> >
> > We're going to try to get this into RHEL-10 ASAP, because dropped RPCs
> > manifest as datacenter-wide problems that are very hard to diagnose.
>
> It sounds like Red Hat also does not have clear evidence that links this
> patch to a specific failure experienced by your customers. This affirms
> my understanding that this fix is defensive rather than urgent.
>
> As a rule, defensive fixes go in during merge windows.
>
>
> > Its a
> > real pain that we won't have an upstream commit assigned for it.
>
> It's not reasonable for any upstream maintainer not employed by Red Hat
> to know about or cleave to Red Hat's internal processes. But, if an
> issue is on Red Hat's radar, then you are welcome to make its priority
> known to me so I can schedule fixes appropriately.
>
> All that said, I've promoted the fix to nfsd-fixes, since it's narrow
> and has several weeks of test experience now.
>
Many thanks!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-06-13 21:05 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-28 0:12 [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface Jeff Layton
2025-05-28 0:12 ` [PATCH 1/2] nfsd: use threads array as-is in netlink interface Jeff Layton
2025-05-28 17:07 ` Simon Horman
2025-06-12 15:57 ` Jeff Layton
2025-06-12 16:05 ` Chuck Lever
2025-06-12 16:15 ` Jeff Layton
2025-06-12 16:42 ` Chuck Lever
2025-06-13 11:33 ` Benjamin Coddington
2025-06-13 14:56 ` Chuck Lever
2025-06-13 15:23 ` Benjamin Coddington
2025-06-13 15:38 ` Chuck Lever
2025-06-13 21:05 ` Jeff Layton
2025-06-13 18:57 ` Mike Snitzer
2025-06-13 19:00 ` Chuck Lever
2025-05-28 0:12 ` [PATCH 2/2] sunrpc: new tracepoints around svc thread wakeups Jeff Layton
2025-05-28 17:13 ` Simon Horman
2025-05-28 18:11 ` [PATCH 0/2] nfsd: use threads array as-is in netlink thread set interface cel
2025-05-28 18:22 ` Jeff Layton
2025-05-28 18:25 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).