* [PATCH 1/3] sunrpc: skip svc_xprt_enqueue when no work is pending
2026-03-24 13:04 [PATCH 0/3] Avoid no-op transport enqueues Chuck Lever
@ 2026-03-24 13:04 ` Chuck Lever
2026-03-24 13:26 ` Jeff Layton
2026-03-24 13:04 ` [PATCH 2/3] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle Chuck Lever
2026-03-24 13:04 ` [PATCH 3/3] sunrpc: skip svc_xprt_enqueue when transport is busy Chuck Lever
2 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2026-03-24 13:04 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
svc_reserve() and svc_xprt_release_slot() call
svc_xprt_enqueue() after modifying xpt_reserved or
xpt_nr_rqsts. The purpose is to re-dispatch the
transport when write-space or a slot becomes available.
However, when neither XPT_DATA nor XPT_DEFERRED is
set, no thread can make progress on the transport and
the enqueue accomplishes nothing.
Trace data from a 256KB NFSv3 WRITE workload over RDMA
shows 11.2 svc_xprt_enqueue() calls per RPC. Of these,
6.9 per RPC lack XPT_DATA and exit svc_xprt_ready()
immediately after executing the smp_rmb(), READ_ONCE(),
and tracepoint. svc_reserve() and svc_xprt_release_slot()
account for roughly five of these per RPC.
A new helper, svc_xprt_resource_released(), checks
XPT_DATA | XPT_DEFERRED before calling
svc_xprt_enqueue(). The existing smp_wmb() barriers
are upgraded to smp_mb() to ensure the flags check
observes a concurrent producer's set_bit(XPT_DATA).
Each producer (svc_rdma_wc_receive, etc.) both sets
XPT_DATA and calls svc_xprt_enqueue(), so even if the
check reads a stale value, the producer's own enqueue
provides a fallback path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/svc_xprt.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 56a663b8939f..73149280167c 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -425,13 +425,28 @@ static bool svc_xprt_reserve_slot(struct svc_rqst *rqstp, struct svc_xprt *xprt)
return true;
}
+/*
+ * After a caller releases write-space or a request slot,
+ * re-enqueue the transport only when there is pending
+ * work that a thread could act on. The smp_mb() pairs
+ * with the smp_rmb() in svc_xprt_ready() and orders the
+ * preceding counter update before the flags read so a
+ * concurrent set_bit(XPT_DATA) is visible here.
+ */
+static void svc_xprt_resource_released(struct svc_xprt *xprt)
+{
+ smp_mb();
+ if (READ_ONCE(xprt->xpt_flags) &
+ (BIT(XPT_DATA) | BIT(XPT_DEFERRED)))
+ svc_xprt_enqueue(xprt);
+}
+
static void svc_xprt_release_slot(struct svc_rqst *rqstp)
{
struct svc_xprt *xprt = rqstp->rq_xprt;
if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
atomic_dec(&xprt->xpt_nr_rqsts);
- smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
- svc_xprt_enqueue(xprt);
+ svc_xprt_resource_released(xprt);
}
}
@@ -525,10 +540,10 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
space += rqstp->rq_res.head[0].iov_len;
if (xprt && space < rqstp->rq_reserved) {
- atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
+ atomic_sub((rqstp->rq_reserved - space),
+ &xprt->xpt_reserved);
rqstp->rq_reserved = space;
- smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
- svc_xprt_enqueue(xprt);
+ svc_xprt_resource_released(xprt);
}
}
EXPORT_SYMBOL_GPL(svc_reserve);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 1/3] sunrpc: skip svc_xprt_enqueue when no work is pending
2026-03-24 13:04 ` [PATCH 1/3] sunrpc: skip svc_xprt_enqueue when no work is pending Chuck Lever
@ 2026-03-24 13:26 ` Jeff Layton
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2026-03-24 13:26 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On Tue, 2026-03-24 at 09:04 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> svc_reserve() and svc_xprt_release_slot() call
> svc_xprt_enqueue() after modifying xpt_reserved or
> xpt_nr_rqsts. The purpose is to re-dispatch the
> transport when write-space or a slot becomes available.
> However, when neither XPT_DATA nor XPT_DEFERRED is
> set, no thread can make progress on the transport and
> the enqueue accomplishes nothing.
>
> Trace data from a 256KB NFSv3 WRITE workload over RDMA
> shows 11.2 svc_xprt_enqueue() calls per RPC. Of these,
> 6.9 per RPC lack XPT_DATA and exit svc_xprt_ready()
> immediately after executing the smp_rmb(), READ_ONCE(),
> and tracepoint. svc_reserve() and svc_xprt_release_slot()
> account for roughly five of these per RPC.
>
> A new helper, svc_xprt_resource_released(), checks
> XPT_DATA | XPT_DEFERRED before calling
> svc_xprt_enqueue(). The existing smp_wmb() barriers
> are upgraded to smp_mb() to ensure the flags check
> observes a concurrent producer's set_bit(XPT_DATA).
> Each producer (svc_rdma_wc_receive, etc.) both sets
> XPT_DATA and calls svc_xprt_enqueue(), so even if the
> check reads a stale value, the producer's own enqueue
> provides a fallback path.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/svc_xprt.c | 25 ++++++++++++++++++++-----
> 1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 56a663b8939f..73149280167c 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -425,13 +425,28 @@ static bool svc_xprt_reserve_slot(struct svc_rqst *rqstp, struct svc_xprt *xprt)
> return true;
> }
>
> +/*
> + * After a caller releases write-space or a request slot,
> + * re-enqueue the transport only when there is pending
> + * work that a thread could act on. The smp_mb() pairs
> + * with the smp_rmb() in svc_xprt_ready() and orders the
> + * preceding counter update before the flags read so a
> + * concurrent set_bit(XPT_DATA) is visible here.
> + */
> +static void svc_xprt_resource_released(struct svc_xprt *xprt)
> +{
> + smp_mb();
> + if (READ_ONCE(xprt->xpt_flags) &
> + (BIT(XPT_DATA) | BIT(XPT_DEFERRED)))
> + svc_xprt_enqueue(xprt);
> +}
> +
> static void svc_xprt_release_slot(struct svc_rqst *rqstp)
> {
> struct svc_xprt *xprt = rqstp->rq_xprt;
> if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
> atomic_dec(&xprt->xpt_nr_rqsts);
> - smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> - svc_xprt_enqueue(xprt);
> + svc_xprt_resource_released(xprt);
> }
> }
>
> @@ -525,10 +540,10 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
> space += rqstp->rq_res.head[0].iov_len;
>
> if (xprt && space < rqstp->rq_reserved) {
> - atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
> + atomic_sub((rqstp->rq_reserved - space),
> + &xprt->xpt_reserved);
> rqstp->rq_reserved = space;
> - smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> - svc_xprt_enqueue(xprt);
> + svc_xprt_resource_released(xprt);
> }
> }
> EXPORT_SYMBOL_GPL(svc_reserve);
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/3] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle
2026-03-24 13:04 [PATCH 0/3] Avoid no-op transport enqueues Chuck Lever
2026-03-24 13:04 ` [PATCH 1/3] sunrpc: skip svc_xprt_enqueue when no work is pending Chuck Lever
@ 2026-03-24 13:04 ` Chuck Lever
2026-03-24 13:39 ` Jeff Layton
2026-03-24 13:04 ` [PATCH 3/3] sunrpc: skip svc_xprt_enqueue when transport is busy Chuck Lever
2 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2026-03-24 13:04 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
svc_xprt_received() unconditionally calls
svc_xprt_enqueue() after clearing XPT_BUSY. When no
work flags are pending, the enqueue traverses
svc_xprt_ready() -- executing an smp_rmb(), READ_ONCE(),
and tracepoint -- before returning false.
Trace data from a 256KB NFSv3 workload over RDMA shows
85% of svc_xprt_received() invocations reach
svc_xprt_enqueue() with no pending work flags. In the
WRITE phase, 167,335 of 196,420 calls find no work; in
the READ phase, 97,165 of 98,276. Each unnecessary call
executes a memory barrier, a flags read, and (when
tracing is active) fires the svc_xprt_enqueue
tracepoint.
Add a flags pre-check between clear_bit(XPT_BUSY) and
svc_xprt_enqueue(). Both the clear and the subsequent
READ_ONCE operate on the same xpt_flags word, so
cache-line serialization of the atomic bitops ensures
the read observes any flag set by a concurrent producer
before the line was acquired for the clear. If a
producer's set_bit occurs after the clear_bit, that
producer's own svc_xprt_enqueue() call observes
!XPT_BUSY and dispatches the transport.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/svc_xprt.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 73149280167c..36c8437cfd8d 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -234,7 +234,19 @@ void svc_xprt_received(struct svc_xprt *xprt)
svc_xprt_get(xprt);
smp_mb__before_atomic();
clear_bit(XPT_BUSY, &xprt->xpt_flags);
- svc_xprt_enqueue(xprt);
+
+ /*
+ * Skip the enqueue when no actionable flags are set.
+ * Each producer both sets its flag (XPT_DATA, XPT_CLOSE,
+ * etc.) and calls svc_xprt_enqueue(); if a set_bit races
+ * with this check, the producer's own enqueue observes
+ * !XPT_BUSY and dispatches the transport.
+ */
+ if (READ_ONCE(xprt->xpt_flags) &
+ (BIT(XPT_CONN) | BIT(XPT_CLOSE) | BIT(XPT_HANDSHAKE) |
+ BIT(XPT_DATA) | BIT(XPT_DEFERRED)))
+ svc_xprt_enqueue(xprt);
+
svc_xprt_put(xprt);
}
EXPORT_SYMBOL_GPL(svc_xprt_received);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle
2026-03-24 13:04 ` [PATCH 2/3] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle Chuck Lever
@ 2026-03-24 13:39 ` Jeff Layton
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2026-03-24 13:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On Tue, 2026-03-24 at 09:04 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> svc_xprt_received() unconditionally calls
> svc_xprt_enqueue() after clearing XPT_BUSY. When no
> work flags are pending, the enqueue traverses
> svc_xprt_ready() -- executing an smp_rmb(), READ_ONCE(),
> and tracepoint -- before returning false.
>
> Trace data from a 256KB NFSv3 workload over RDMA shows
> 85% of svc_xprt_received() invocations reach
> svc_xprt_enqueue() with no pending work flags. In the
> WRITE phase, 167,335 of 196,420 calls find no work; in
> the READ phase, 97,165 of 98,276. Each unnecessary call
> executes a memory barrier, a flags read, and (when
> tracing is active) fires the svc_xprt_enqueue
> tracepoint.
>
> Add a flags pre-check between clear_bit(XPT_BUSY) and
> svc_xprt_enqueue(). Both the clear and the subsequent
> READ_ONCE operate on the same xpt_flags word, so
> cache-line serialization of the atomic bitops ensures
> the read observes any flag set by a concurrent producer
> before the line was acquired for the clear. If a
> producer's set_bit occurs after the clear_bit, that
> producer's own svc_xprt_enqueue() call observes
> !XPT_BUSY and dispatches the transport.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/svc_xprt.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 73149280167c..36c8437cfd8d 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -234,7 +234,19 @@ void svc_xprt_received(struct svc_xprt *xprt)
> svc_xprt_get(xprt);
> smp_mb__before_atomic();
> clear_bit(XPT_BUSY, &xprt->xpt_flags);
> - svc_xprt_enqueue(xprt);
> +
> + /*
> + * Skip the enqueue when no actionable flags are set.
> + * Each producer both sets its flag (XPT_DATA, XPT_CLOSE,
> + * etc.) and calls svc_xprt_enqueue(); if a set_bit races
> + * with this check, the producer's own enqueue observes
> + * !XPT_BUSY and dispatches the transport.
> + */
> + if (READ_ONCE(xprt->xpt_flags) &
> + (BIT(XPT_CONN) | BIT(XPT_CLOSE) | BIT(XPT_HANDSHAKE) |
> + BIT(XPT_DATA) | BIT(XPT_DEFERRED)))
> + svc_xprt_enqueue(xprt);
> +
> svc_xprt_put(xprt);
> }
> EXPORT_SYMBOL_GPL(svc_xprt_received);
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/3] sunrpc: skip svc_xprt_enqueue when transport is busy
2026-03-24 13:04 [PATCH 0/3] Avoid no-op transport enqueues Chuck Lever
2026-03-24 13:04 ` [PATCH 1/3] sunrpc: skip svc_xprt_enqueue when no work is pending Chuck Lever
2026-03-24 13:04 ` [PATCH 2/3] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle Chuck Lever
@ 2026-03-24 13:04 ` Chuck Lever
2026-03-24 13:42 ` Jeff Layton
2 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2026-03-24 13:04 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
svc_xprt_resource_released() calls svc_xprt_enqueue()
whenever XPT_DATA or XPT_DEFERRED is set. During RPC
processing, svc_reserve_auth() reduces the reservation
counter and triggers this path while the current thread
still holds XPT_BUSY. The enqueue enters svc_xprt_ready(),
executes an smp_rmb(), READ_ONCE(), and tracepoint, then
returns false on seeing XPT_BUSY.
Trace data from a 256KB NFSv3 WRITE workload over TCP
shows this pattern generates roughly 195,000 wasted
enqueue calls -- approximately one per RPC -- each
paying the full svc_xprt_ready() cost for no benefit.
Add a BUSY check alongside the existing DATA|DEFERRED
check in svc_xprt_resource_released(). When the
transport is BUSY, the holder will call
svc_xprt_received() upon completion, which already
checks for pending work flags and re-enqueues.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/svc_xprt.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 36c8437cfd8d..d2b8f0396b6a 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -440,16 +440,23 @@ static bool svc_xprt_reserve_slot(struct svc_rqst *rqstp, struct svc_xprt *xprt)
/*
* After a caller releases write-space or a request slot,
* re-enqueue the transport only when there is pending
- * work that a thread could act on. The smp_mb() pairs
+ * work that a thread could act on. The smp_mb() pairs
* with the smp_rmb() in svc_xprt_ready() and orders the
* preceding counter update before the flags read so a
* concurrent set_bit(XPT_DATA) is visible here.
+ *
+ * When the transport is BUSY, the thread holding it will
+ * call svc_xprt_received() upon completion, which checks
+ * for pending work and re-enqueues as needed.
*/
static void svc_xprt_resource_released(struct svc_xprt *xprt)
{
+ unsigned long xpt_flags;
+
smp_mb();
- if (READ_ONCE(xprt->xpt_flags) &
- (BIT(XPT_DATA) | BIT(XPT_DEFERRED)))
+ xpt_flags = READ_ONCE(xprt->xpt_flags);
+ if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED)) &&
+ !(xpt_flags & BIT(XPT_BUSY)))
svc_xprt_enqueue(xprt);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 3/3] sunrpc: skip svc_xprt_enqueue when transport is busy
2026-03-24 13:04 ` [PATCH 3/3] sunrpc: skip svc_xprt_enqueue when transport is busy Chuck Lever
@ 2026-03-24 13:42 ` Jeff Layton
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2026-03-24 13:42 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On Tue, 2026-03-24 at 09:04 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> svc_xprt_resource_released() calls svc_xprt_enqueue()
> whenever XPT_DATA or XPT_DEFERRED is set. During RPC
> processing, svc_reserve_auth() reduces the reservation
> counter and triggers this path while the current thread
> still holds XPT_BUSY. The enqueue enters svc_xprt_ready(),
> executes an smp_rmb(), READ_ONCE(), and tracepoint, then
> returns false on seeing XPT_BUSY.
>
> Trace data from a 256KB NFSv3 WRITE workload over TCP
> shows this pattern generates roughly 195,000 wasted
> enqueue calls -- approximately one per RPC -- each
> paying the full svc_xprt_ready() cost for no benefit.
>
> Add a BUSY check alongside the existing DATA|DEFERRED
> check in svc_xprt_resource_released(). When the
> transport is BUSY, the holder will call
> svc_xprt_received() upon completion, which already
> checks for pending work flags and re-enqueues.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/svc_xprt.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 36c8437cfd8d..d2b8f0396b6a 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -440,16 +440,23 @@ static bool svc_xprt_reserve_slot(struct svc_rqst *rqstp, struct svc_xprt *xprt)
> /*
> * After a caller releases write-space or a request slot,
> * re-enqueue the transport only when there is pending
> - * work that a thread could act on. The smp_mb() pairs
> + * work that a thread could act on. The smp_mb() pairs
> * with the smp_rmb() in svc_xprt_ready() and orders the
> * preceding counter update before the flags read so a
> * concurrent set_bit(XPT_DATA) is visible here.
> + *
> + * When the transport is BUSY, the thread holding it will
> + * call svc_xprt_received() upon completion, which checks
> + * for pending work and re-enqueues as needed.
> */
> static void svc_xprt_resource_released(struct svc_xprt *xprt)
> {
> + unsigned long xpt_flags;
> +
> smp_mb();
> - if (READ_ONCE(xprt->xpt_flags) &
> - (BIT(XPT_DATA) | BIT(XPT_DEFERRED)))
> + xpt_flags = READ_ONCE(xprt->xpt_flags);
> + if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED)) &&
> + !(xpt_flags & BIT(XPT_BUSY)))
> svc_xprt_enqueue(xprt);
> }
>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread