From: Benjamin Coddington <ben.coddington@hammerspace.com>
To: Chuck Lever <cel@kernel.org>, Jeff Layton <jlayton@kernel.org>,
NeilBrown <neilb@ownmail.net>
Cc: linux-nfs@vger.kernel.org, Daire Byrne <daire@brahma.io>
Subject: [PATCH RFC 0/4] nfsd: per-client fair-queue dispatch
Date: Wed, 3 Jun 2026 11:09:38 -0400 [thread overview]
Message-ID: <cover.1780498019.git.bcodding@hammerspace.com> (raw)
knfsd dispatches ready transports from a single per-pool FIFO, so a
client's share of nfsd service scales with the number of connections it
holds rather than being shared per client. A client that opens many
connections (nconnect, or a farm of data movers) starves other clients
on the same server purely by out-numbering them in sockets.
I measured this with a load generator that pins each request to a fixed
service time and does no filesystem work, so that nfsd thread-time is the
only scarce resource (8 threads, 10ms/op, ~648 ops/s pool ceiling). A
greedy client opens K connections alongside one single-connection
interactive client.
NFSv3, dispatch as it is today:
greedy K greedy share interactive ops/s
1 50% 241
4 80% 129
8 89% 72
16 94% 38
The interactive client's share tracks 1/(K+1) and its throughput falls
roughly 6x while it does nothing different. NFSv4.1 behaves identically
(89% greedy at K=8) even when the greedy connections are bound to a
single session, because the dispatch decision is below the NFS version.
The same NFSv4.1 workload with fair queueing enabled:
greedy K greedy share interactive ops/s
8 72% 182
16 73% 177
32 70% 193
The greedy client's share no longer climbs with its connection count and
the interactive client recovers (72 -> 182 ops/s at K=8). Aggregate
throughput is unchanged: the T/D pool ceiling is the same with fair
queueing on and off. The split does not reach 50/50 because a single
interactive connection is bounded by its request window and by XPT_BUSY
serialising one transport; with a deeper window it reaches ~59/41.
The approach:
- sunrpc grows an opaque per-transport fairness key (patch 1), with a
default derived from the source address (the source port is excluded
so a client's several connections share one key), and an opt-in
per-pool scheduler that buckets ready transports by that key and
dispatches round-robin across keys (patch 2). When it is disabled,
which is the default, the existing lockless FIFO path is unchanged.
- nfsd gains a "fairq" module parameter to turn it on (patch 3) and
stamps the NFSv4.1 clientid as the key when a connection binds to a
session (patch 4), so all of a client's connections share one key.
NFSv3 uses the source-address default.
This is an RFC; a few questions for the list:
- Unit of fairness: clientid (used here) or session? Earlier
discussion leaned toward exploring per-session.
- Mechanism: a fixed bucket hash under a per-pool spinlock taken only
on the opt-in path, versus a lockless or per-flow structure.
- Would a per-client in-flight cap be preferable to proportional fair
queueing?
The measurement used a debug-only filehandle-latency hook that is not
part of this series.
Benjamin Coddington (4):
sunrpc: add a per-transport fairness key to svc_xprt
sunrpc: dispatch ready transports fairly per client
nfsd: add a fairq module parameter
nfsd: key NFSv4.1 connections by clientid for fair queueing
fs/nfsd/nfs4state.c | 17 +++
fs/nfsd/nfssvc.c | 19 +++
include/linux/sunrpc/svc.h | 5 +
include/linux/sunrpc/svc_xprt.h | 46 ++++++-
net/sunrpc/svc.c | 2 +
net/sunrpc/svc_xprt.c | 216 +++++++++++++++++++++++++++++++-
6 files changed, 302 insertions(+), 3 deletions(-)
base-commit: e7ca66ba17f1b5e4ecbb29b9c3c4a31aa062bed0
--
2.53.0
next reply other threads:[~2026-06-03 15:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 15:09 Benjamin Coddington [this message]
2026-06-03 15:09 ` [PATCH RFC 1/4] sunrpc: add a per-transport fairness key to svc_xprt Benjamin Coddington
2026-06-03 15:09 ` [PATCH RFC 2/4] sunrpc: dispatch ready transports fairly per client Benjamin Coddington
2026-06-03 15:09 ` [PATCH RFC 3/4] nfsd: add a fairq module parameter Benjamin Coddington
2026-06-03 15:09 ` [PATCH RFC 4/4] nfsd: key NFSv4.1 connections by clientid for fair queueing Benjamin Coddington
2026-06-03 22:52 ` [PATCH RFC 0/4] nfsd: per-client fair-queue dispatch NeilBrown
2026-06-04 12:11 ` Daire Byrne
2026-06-04 14:54 ` Chuck Lever
2026-06-04 16:08 ` Benjamin Coddington
2026-06-04 16:43 ` Chuck Lever
2026-06-04 21:11 ` NeilBrown
2026-06-04 14:25 ` Benjamin Coddington
2026-06-04 22:27 ` Jeff Layton
2026-06-04 22:48 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1780498019.git.bcodding@hammerspace.com \
--to=ben.coddington@hammerspace.com \
--cc=cel@kernel.org \
--cc=daire@brahma.io \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@ownmail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.