* [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference
@ 2026-05-18 18:24 Chuck Lever
2026-05-18 18:24 ` [PATCH net 1/7] net/handshake: Drain pending requests at net namespace exit Chuck Lever
` (6 more replies)
0 siblings, 7 replies; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
handshake_nl_accept_doit() has accumulated four follow-on fixes
since 3b3009ea8abb ("net/handshake: Create a NETLINK service for
handling handshake requests"): 7ea9c1ec66bc, 7798b59409c3,
fe67b063f687, and dabac51b8102. Each was a local refcount or
NULL-check correction; none moved where the file reference is
owned, and the same code keeps producing the same class of bug.
Reworking the ownership is what breaks the pattern.
For the duration of a request, sock->file has no single owner.
Submit publishes the request without taking a file reference;
accept_doit acquires one inside the handler, after the request
has already left the pending list. The consumer can drop its
own reference at any time, including the moment between
handshake_req_next() popping the request and accept_doit
reaching get_file(). The submit-side sock_hold() pins only
struct sock; struct socket and sock->file remain under the
consumer's control via the file descriptor.
This series places the file reference under unambiguous
ownership. handshake_req_submit() pins it on the request and
completion or cancel drops it (patches 4-5); the submit-side
sock_hold() then becomes redundant, and dropping it also closes
a publish-before-pin race the late sock_hold itself opened
(patch 6). Patch 3 normalizes the errno sign convention across
the handshake_complete() API and its consumers, and patch 2 is
the matching correction in nvme-tcp. Patch 1 fixes an unrelated
netns-exit leak. Patch 7 adds KUnit file-count assertions to
verify the new refcount contract.
Three things in this restructuring want a careful look. In
handshake_complete(), the fput() of the request's file
reference has to come after hp_done() and after the sock_put()
that may free sk -- fput() can transitively run
handshake_sk_destruct() and free the request, so the patch
stashes hr_file in a local first. handshake_sk_destruct()
itself is kept on purpose: it owns rhashtable removal and
kfree, and remains the backstop if a consumer path bypasses
handshake_complete() entirely. Third, handshake_req_next() now
returns its request with an extra get_file() held under
hn_lock; accept_doit must consume that reference (FD_PREPARE on
success, explicit fput on the fdf.err path), and any future
caller has to honor the same contract.
Patches 1, 3, 5, and 6 carry Fixes: 3b3009ea8abb and should
travel to stable together. Patch 2 carries Fixes: be8e82caa685.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Chuck Lever (7):
net/handshake: Drain pending requests at net namespace exit
nvme-tcp: store negative errno in queue->tls_err
net/handshake: Pass negative errno through handshake_complete()
net/handshake: Take a long-lived file reference at submit
net/handshake: hand off the pinned file reference to accept_doit
net/handshake: Close the submit-side sock_hold race
net/handshake: Verify file-reference balance in submit paths
drivers/nvme/host/tcp.c | 2 +-
net/handshake/handshake-test.c | 38 +++++++++++++++++++++++++++++++-
net/handshake/handshake.h | 6 +++--
net/handshake/netlink.c | 26 ++++++++++------------
net/handshake/request.c | 50 ++++++++++++++++++++++++++++++++++--------
net/handshake/tlshd.c | 4 ++--
6 files changed, 96 insertions(+), 30 deletions(-)
---
base-commit: f5b2772d14884f4be9e718644f1203d4d0e6f0d6
change-id: 20260516-handshake-file-pin-21390f5f3513
Best regards,
--
Chuck Lever <chuck.lever@oracle.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net 1/7] net/handshake: Drain pending requests at net namespace exit
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
@ 2026-05-18 18:24 ` Chuck Lever
2026-05-19 7:33 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err Chuck Lever
` (5 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The arguments to list_splice_init() in handshake_net_exit() are
reversed. The call moves the local empty "requests" list onto
hn->hn_requests, leaving the local list empty, so the subsequent
drain loop runs zero iterations. Pending handshake requests that
had not yet been accepted are not torn down when the net namespace
is destroyed; each one keeps a reference on a socket file and on
the handshake_req allocation.
Pass the source and destination in the documented order
(list_splice_init(list, head) moves list onto head) so the pending
list is transferred to the local scratch list and drained through
handshake_complete().
Fixing the splice direction exposes a list-corruption race. After
the splice each req->hr_list still has non-empty link pointers,
threading the stack-local scratch list rather than hn_requests.
A concurrent handshake_req_cancel() -- for example, from sunrpc's
TLS timeout on a kernel socket whose netns reference was not
taken -- finds the request through the rhashtable, calls
remove_pending(), and sees !list_empty(&req->hr_list).
__remove_pending_locked() then list_del_init()s an entry off the
scratch list while the drain iterates, corrupting it. The same
call arriving after the drain loop has run list_del() on an
entry hits LIST_POISON instead.
Have remove_pending() check HANDSHAKE_F_NET_DRAINING under
hn_lock and report not-found when drain is in progress. The
drain has already taken ownership; handshake_complete()'s
existing test_and_set on HANDSHAKE_F_REQ_COMPLETED still
arbitrates between drain and cancel for who calls the consumer's
hp_done. Use list_del_init() rather than list_del() in the drain
so req->hr_list does not carry LIST_POISON after drain releases
the entry.
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/handshake/netlink.c | 4 ++--
net/handshake/request.c | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
index b989456fc4c5..a8dd848371ff 100644
--- a/net/handshake/netlink.c
+++ b/net/handshake/netlink.c
@@ -204,12 +204,12 @@ static void __net_exit handshake_net_exit(struct net *net)
*/
spin_lock(&hn->hn_lock);
set_bit(HANDSHAKE_F_NET_DRAINING, &hn->hn_flags);
- list_splice_init(&requests, &hn->hn_requests);
+ list_splice_init(&hn->hn_requests, &requests);
spin_unlock(&hn->hn_lock);
while (!list_empty(&requests)) {
req = list_first_entry(&requests, struct handshake_req, hr_list);
- list_del(&req->hr_list);
+ list_del_init(&req->hr_list);
/*
* Requests on this list have not yet been
diff --git a/net/handshake/request.c b/net/handshake/request.c
index 2829adbeb149..cebafcc341d2 100644
--- a/net/handshake/request.c
+++ b/net/handshake/request.c
@@ -168,7 +168,8 @@ static bool remove_pending(struct handshake_net *hn, struct handshake_req *req)
bool ret = false;
spin_lock(&hn->hn_lock);
- if (!list_empty(&req->hr_list)) {
+ if (!test_bit(HANDSHAKE_F_NET_DRAINING, &hn->hn_flags) &&
+ !list_empty(&req->hr_list)) {
__remove_pending_locked(hn, req);
ret = true;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
2026-05-18 18:24 ` [PATCH net 1/7] net/handshake: Drain pending requests at net namespace exit Chuck Lever
@ 2026-05-18 18:24 ` Chuck Lever
2026-05-19 7:35 ` Hannes Reinecke
2026-05-20 2:59 ` Alistair Francis
2026-05-18 18:24 ` [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete() Chuck Lever
` (4 subsequent siblings)
6 siblings, 2 replies; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
nvme_tcp_tls_done() assigns queue->tls_err in three branches. The
ENOKEY lookup failure and the EOPNOTSUPP initializer both store
negative errnos. The third branch, reached when the handshake
layer reports a non-zero status, stores -status.
The handshake layer delivers status to the consumer callback as a
negative errno; the other in-tree consumers --
xs_tls_handshake_done() and the nvmet target callback -- treat
their status argument that way. The extra negation in
nvme_tcp_tls_done() flips the sign, leaving tls_err as a positive
value (for instance, +EIO), which nvme_tcp_start_tls() then
returns to its caller.
Drop the extra negation so queue->tls_err uniformly carries a
negative errno on failure.
Fixes: be8e82caa685 ("nvme-tcp: enable TLS handshake upcall")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
drivers/nvme/host/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 15d36d6a728e..68a1d7640494 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1702,7 +1702,7 @@ static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid)
qid, pskid, status);
if (status) {
- queue->tls_err = -status;
+ queue->tls_err = status;
goto out_complete;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete()
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
2026-05-18 18:24 ` [PATCH net 1/7] net/handshake: Drain pending requests at net namespace exit Chuck Lever
2026-05-18 18:24 ` [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err Chuck Lever
@ 2026-05-18 18:24 ` Chuck Lever
2026-05-19 7:47 ` Hannes Reinecke
2026-05-21 2:39 ` Jakub Kicinski
2026-05-18 18:24 ` [PATCH net 4/7] net/handshake: Take a long-lived file reference at submit Chuck Lever
` (3 subsequent siblings)
6 siblings, 2 replies; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
handshake_complete() declares status as unsigned int and
tls_handshake_done() negates that value (-status) before handing
it to the TLS consumer. Consumers match on negative errno
constants -- xs_tls_handshake_done() has
switch (status) {
case 0:
case -EACCES:
case -ETIMEDOUT:
lower_transport->xprt_err = status;
break;
default:
lower_transport->xprt_err = -EACCES;
}
so the API as designed expects callers to pass positive errno
values that the tlshd shim then negates.
Three internal callers in handshake_nl_accept_doit(), the
net-exit drain, and a kunit test follow kernel convention and
pass negative errnos -- -EIO, -ETIMEDOUT, -ETIMEDOUT. The
implicit conversion to unsigned int turns -ETIMEDOUT into
0xFFFFFF92; the subsequent -status in tls_handshake_done()
wraps back to 110, the consumer's switch falls through, and
the xprt reports -EACCES on what should be -ETIMEDOUT or -EIO.
Fix the API rather than the call sites. The natural kernel
convention is negative errno in, negative errno out. Change
handshake_complete() and hp_done to take int status, drop the
negation in tls_handshake_done(), and negate once in
handshake_nl_done_doit() where status arrives from the wire
as an unsigned netlink attribute. The three internal callers
were already correct under that convention and need no change.
At the same wire boundary, clamp values above MAX_ERRNO to the
default -EIO. A well-behaved tlshd never sends such a value,
but negating an unconstrained u32 invokes signed-integer
overflow when the high bit is set, which UBSAN would report.
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/handshake/handshake-test.c | 2 +-
net/handshake/handshake.h | 4 ++--
net/handshake/netlink.c | 9 +++++++--
net/handshake/request.c | 2 +-
net/handshake/tlshd.c | 4 ++--
5 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 55442b2f518a..df3948e807a0 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -25,7 +25,7 @@ static int test_accept_func(struct handshake_req *req, struct genl_info *info,
return 0;
}
-static void test_done_func(struct handshake_req *req, unsigned int status,
+static void test_done_func(struct handshake_req *req, int status,
struct genl_info *info)
{
}
diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
index a48163765a7a..2289b0e274f4 100644
--- a/net/handshake/handshake.h
+++ b/net/handshake/handshake.h
@@ -57,7 +57,7 @@ struct handshake_proto {
int (*hp_accept)(struct handshake_req *req,
struct genl_info *info, int fd);
void (*hp_done)(struct handshake_req *req,
- unsigned int status,
+ int status,
struct genl_info *info);
void (*hp_destroy)(struct handshake_req *req);
};
@@ -86,7 +86,7 @@ struct handshake_req *handshake_req_hash_lookup(struct sock *sk);
struct handshake_req *handshake_req_next(struct handshake_net *hn, int class);
int handshake_req_submit(struct socket *sock, struct handshake_req *req,
gfp_t flags);
-void handshake_complete(struct handshake_req *req, unsigned int status,
+void handshake_complete(struct handshake_req *req, int status,
struct genl_info *info);
bool handshake_req_cancel(struct sock *sk);
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
index a8dd848371ff..08e5a5867a90 100644
--- a/net/handshake/netlink.c
+++ b/net/handshake/netlink.c
@@ -159,8 +159,13 @@ int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info)
trace_handshake_cmd_done(net, req, sock->sk, fd);
status = -EIO;
- if (info->attrs[HANDSHAKE_A_DONE_STATUS])
- status = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_STATUS]);
+ if (info->attrs[HANDSHAKE_A_DONE_STATUS]) {
+ u32 val = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_STATUS]);
+
+ /* Values above MAX_ERRNO would overflow on negation. */
+ if (val <= MAX_ERRNO)
+ status = -(int)val;
+ }
handshake_complete(req, status, info);
sockfd_put(sock);
diff --git a/net/handshake/request.c b/net/handshake/request.c
index cebafcc341d2..93cec5f0e16e 100644
--- a/net/handshake/request.c
+++ b/net/handshake/request.c
@@ -285,7 +285,7 @@ int handshake_req_submit(struct socket *sock, struct handshake_req *req,
}
EXPORT_SYMBOL(handshake_req_submit);
-void handshake_complete(struct handshake_req *req, unsigned int status,
+void handshake_complete(struct handshake_req *req, int status,
struct genl_info *info)
{
struct sock *sk = req->hr_sk;
diff --git a/net/handshake/tlshd.c b/net/handshake/tlshd.c
index 8f9532a15f43..3dd99982150f 100644
--- a/net/handshake/tlshd.c
+++ b/net/handshake/tlshd.c
@@ -93,7 +93,7 @@ static void tls_handshake_remote_peerids(struct tls_handshake_req *treq,
*
*/
static void tls_handshake_done(struct handshake_req *req,
- unsigned int status, struct genl_info *info)
+ int status, struct genl_info *info)
{
struct tls_handshake_req *treq = handshake_req_private(req);
@@ -104,7 +104,7 @@ static void tls_handshake_done(struct handshake_req *req,
if (!status)
set_bit(HANDSHAKE_F_REQ_SESSION, &req->hr_flags);
- treq->th_consumer_done(treq->th_consumer_data, -status,
+ treq->th_consumer_done(treq->th_consumer_data, status,
treq->th_peerid[0]);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net 4/7] net/handshake: Take a long-lived file reference at submit
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
` (2 preceding siblings ...)
2026-05-18 18:24 ` [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete() Chuck Lever
@ 2026-05-18 18:24 ` Chuck Lever
2026-05-19 7:55 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 5/7] net/handshake: hand off the pinned file reference to accept_doit Chuck Lever
` (2 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
handshake_nl_accept_doit() needs the file pointer backing
req->hr_sk->sk_socket to survive the window between
handshake_req_next() and the subsequent FD_PREPARE() and get_file().
The submit-side sock_hold() does not provide that. sk_refcnt keeps
struct sock alive, but struct socket is owned by sock->file: when
the consumer fputs the last file reference, sock_release() tears
the socket down regardless of any sock_hold.
Prepare for a fix by adding an hr_file pointer to struct
handshake_req and acquiring an explicit reference on sock->file
during handshake_req_submit(). Release the reference on the
completion-bit-winning path of handshake_complete() and
handshake_req_cancel(), and on the submit error paths that destroy
the request before it has been published.
The accept-side dereferences are not yet retargeted; that change
comes in the next patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/handshake/handshake.h | 2 ++
net/handshake/netlink.c | 6 ------
net/handshake/request.c | 27 ++++++++++++++++++++++++++-
3 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
index 2289b0e274f4..da61cadd1ad3 100644
--- a/net/handshake/handshake.h
+++ b/net/handshake/handshake.h
@@ -24,6 +24,7 @@ enum hn_flags_bits {
HANDSHAKE_F_NET_DRAINING,
};
+struct file;
struct handshake_proto;
/* One handshake request */
@@ -32,6 +33,7 @@ struct handshake_req {
struct rhash_head hr_rhash;
unsigned long hr_flags;
const struct handshake_proto *hr_proto;
+ struct file *hr_file;
struct sock *hr_sk;
void (*hr_odestruct)(struct sock *sk);
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
index 08e5a5867a90..0d5871f4bd67 100644
--- a/net/handshake/netlink.c
+++ b/net/handshake/netlink.c
@@ -215,12 +215,6 @@ static void __net_exit handshake_net_exit(struct net *net)
while (!list_empty(&requests)) {
req = list_first_entry(&requests, struct handshake_req, hr_list);
list_del_init(&req->hr_list);
-
- /*
- * Requests on this list have not yet been
- * accepted, so they do not have an fd to put.
- */
-
handshake_complete(req, -ETIMEDOUT, NULL);
}
}
diff --git a/net/handshake/request.c b/net/handshake/request.c
index 93cec5f0e16e..b076ae539422 100644
--- a/net/handshake/request.c
+++ b/net/handshake/request.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/skbuff.h>
#include <linux/inet.h>
+#include <linux/file.h>
#include <linux/rhashtable.h>
#include <net/sock.h>
@@ -219,6 +220,12 @@ EXPORT_SYMBOL_IF_KUNIT(handshake_req_next);
* A negative return value from handshake_req_submit() means that
* no completion callback will be done and that @req has been
* destroyed.
+ *
+ * The caller must hold a reference on @sock->file for the duration
+ * of this call. Once the request is published to the accept side, a
+ * concurrent completion or cancellation may release the request's pin on
+ * @sock->file; the caller's reference is what keeps @sock->sk valid until
+ * handshake_req_submit() returns.
*/
int handshake_req_submit(struct socket *sock, struct handshake_req *req,
gfp_t flags)
@@ -237,6 +244,14 @@ int handshake_req_submit(struct socket *sock, struct handshake_req *req,
kfree(req);
return -EINVAL;
}
+
+ /*
+ * Pin sock->file for the lifetime of the request so the
+ * accept side does not race a consumer that releases the
+ * socket while a handshake is pending.
+ */
+ req->hr_file = get_file(sock->file);
+
req->hr_odestruct = req->hr_sk->sk_destruct;
req->hr_sk->sk_destruct = handshake_sk_destruct;
@@ -268,7 +283,11 @@ int handshake_req_submit(struct socket *sock, struct handshake_req *req,
goto out_err;
}
- /* Prevent socket release while a handshake request is pending */
+ /*
+ * Pin struct sock so sk_destruct does not run until the
+ * handshake completion path releases it; struct socket is
+ * held separately via hr_file above.
+ */
sock_hold(req->hr_sk);
trace_handshake_submit(net, req, req->hr_sk);
@@ -280,6 +299,7 @@ int handshake_req_submit(struct socket *sock, struct handshake_req *req,
/* Restore original destructor so socket teardown still runs on failure */
req->hr_sk->sk_destruct = req->hr_odestruct;
trace_handshake_submit_err(net, req, req->hr_sk, ret);
+ fput(req->hr_file);
handshake_req_destroy(req);
return ret;
}
@@ -292,11 +312,15 @@ void handshake_complete(struct handshake_req *req, int status,
struct net *net = sock_net(sk);
if (!test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED, &req->hr_flags)) {
+ struct file *file = req->hr_file;
+
trace_handshake_complete(net, req, sk, status);
req->hr_proto->hp_done(req, status, info);
/* Handshake request is no longer pending */
sock_put(sk);
+
+ fput(file);
}
}
EXPORT_SYMBOL_IF_KUNIT(handshake_complete);
@@ -345,6 +369,7 @@ bool handshake_req_cancel(struct sock *sk)
/* Handshake request is no longer pending */
sock_put(sk);
+ fput(req->hr_file);
return true;
}
EXPORT_SYMBOL(handshake_req_cancel);
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net 5/7] net/handshake: hand off the pinned file reference to accept_doit
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
` (3 preceding siblings ...)
2026-05-18 18:24 ` [PATCH net 4/7] net/handshake: Take a long-lived file reference at submit Chuck Lever
@ 2026-05-18 18:24 ` Chuck Lever
2026-05-19 7:59 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 6/7] net/handshake: Close the submit-side sock_hold race Chuck Lever
2026-05-18 18:24 ` [PATCH net 7/7] net/handshake: Verify file-reference balance in submit paths Chuck Lever
6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
handshake_req_next() removes the request from the per-net
pending list and drops hn_lock before handshake_nl_accept_doit()
reads req->hr_sk->sk_socket and dereferences sock->file (once in
FD_PREPARE() and again in get_file()). In that window a
consumer running tls_handshake_cancel() followed by sockfd_put()
(svc_sock_free) or __fput_sync() (xs_reset_transport) releases
sock->file. sock_release() then runs sock_orphan(), zeroing
sk_socket, and frees the struct socket. The accept-side code
either reads NULL through sk_socket or chases freed memory.
The submit-side sock_hold() does not prevent this. sk_refcnt
protects struct sock, but struct socket and sock->file are
independently refcounted via the file descriptor the consumer
owns. Pinning sk leaves sock and sock->file unprotected.
Retarget the accept-side dereferences at req->hr_file, which was
pinned at submit time, instead of req->hr_sk->sk_socket->file.
Pinning on its own is not sufficient: a consumer that cancels
between handshake_req_next() returning and accept_doit reaching
FD_PREPARE() takes the !remove_pending() branch in
handshake_req_cancel() and drops hr_file before the accept side
takes its own reference. Hand off an additional file reference
inside handshake_req_next(), under hn_lock, so the accept side
operates on a reference that no concurrent handshake_req_cancel()
can revoke. FD_PREPARE() consumes that handed-off reference,
either by transferring it to the new fd in fd_publish() or by
dropping it in the cleanup destructor on error; the explicit
get_file() that previously balanced FD_PREPARE() is therefore
redundant and goes away.
Update handshake_req_cancel_test2 and _test3 to simulate the
FD_PREPARE() consumption with an fput() so the kunit file-count
assertions stay balanced.
Reported-by: Chris Mason <clm@meta.com>
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/handshake/handshake-test.c | 8 ++++++++
net/handshake/netlink.c | 7 ++-----
net/handshake/request.c | 18 ++++++++++++++++++
3 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index df3948e807a0..9cc7a95f4120 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -375,6 +375,10 @@ static void handshake_req_cancel_test2(struct kunit *test)
/* Pretend to accept this request */
next = handshake_req_next(hn, HANDSHAKE_HANDLER_CLASS_TLSHD);
KUNIT_ASSERT_PTR_EQ(test, req, next);
+ /* Simulate FD_PREPARE() consuming the file reference handed
+ * off by handshake_req_next(); see handshake_nl_accept_doit().
+ */
+ fput(filp);
/* Act */
result = handshake_req_cancel(sock->sk);
@@ -417,6 +421,10 @@ static void handshake_req_cancel_test3(struct kunit *test)
/* Pretend to accept this request */
next = handshake_req_next(hn, HANDSHAKE_HANDLER_CLASS_TLSHD);
KUNIT_ASSERT_PTR_EQ(test, req, next);
+ /* Simulate FD_PREPARE() consuming the file reference handed
+ * off by handshake_req_next(); see handshake_nl_accept_doit().
+ */
+ fput(filp);
/* Pretend to complete this request */
handshake_complete(next, -ETIMEDOUT, NULL);
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
index 0d5871f4bd67..086d5439c155 100644
--- a/net/handshake/netlink.c
+++ b/net/handshake/netlink.c
@@ -92,7 +92,6 @@ int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info)
struct net *net = sock_net(skb->sk);
struct handshake_net *hn = handshake_pernet(net);
struct handshake_req *req = NULL;
- struct socket *sock;
int class, err;
err = -EOPNOTSUPP;
@@ -107,15 +106,13 @@ int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info)
err = -EAGAIN;
req = handshake_req_next(hn, class);
if (req) {
- sock = req->hr_sk->sk_socket;
-
- FD_PREPARE(fdf, O_CLOEXEC, sock->file);
+ FD_PREPARE(fdf, O_CLOEXEC, req->hr_file);
if (fdf.err) {
+ fput(req->hr_file); /* drop ref from handshake_req_next() */
err = fdf.err;
goto out_complete;
}
- get_file(sock->file); /* FD_PREPARE() consumes a reference. */
err = req->hr_proto->hp_accept(req, info, fd_prepare_fd(fdf));
if (err)
goto out_complete; /* Automatic cleanup handles fput */
diff --git a/net/handshake/request.c b/net/handshake/request.c
index b076ae539422..d24b55af1a0a 100644
--- a/net/handshake/request.c
+++ b/net/handshake/request.c
@@ -179,6 +179,17 @@ static bool remove_pending(struct handshake_net *hn, struct handshake_req *req)
return ret;
}
+/**
+ * handshake_req_next - Return the next queued handshake request
+ * @hn: per-net handshake state
+ * @class: handler class to match
+ *
+ * On a non-NULL return, the caller owns an extra reference
+ * on @req->hr_file. FD_PREPARE() consumes it on success; on
+ * the FD_PREPARE() failure path the caller must fput() it.
+ *
+ * Return: pointer to a removed handshake_req, or NULL.
+ */
struct handshake_req *handshake_req_next(struct handshake_net *hn, int class)
{
struct handshake_req *req, *pos;
@@ -189,6 +200,13 @@ struct handshake_req *handshake_req_next(struct handshake_net *hn, int class)
if (pos->hr_proto->hp_handler_class != class)
continue;
__remove_pending_locked(hn, pos);
+ /* Hand off a file reference to the accept side under
+ * hn_lock. A concurrent handshake_req_cancel() can drop
+ * hr_file before accept reaches FD_PREPARE(); this extra
+ * reference keeps the file alive until FD_PREPARE() takes
+ * ownership.
+ */
+ get_file(pos->hr_file);
req = pos;
break;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net 6/7] net/handshake: Close the submit-side sock_hold race
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
` (4 preceding siblings ...)
2026-05-18 18:24 ` [PATCH net 5/7] net/handshake: hand off the pinned file reference to accept_doit Chuck Lever
@ 2026-05-18 18:24 ` Chuck Lever
2026-05-19 8:01 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 7/7] net/handshake: Verify file-reference balance in submit paths Chuck Lever
6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
handshake_req_submit() publishes the request via
handshake_req_hash_add() and __add_pending_locked(), drops
hn_lock, and calls handshake_genl_notify() (which can sleep)
before taking sock_hold() on req->hr_sk. A fast tlshd ACCEPT
followed by DONE can drive handshake_complete()'s sock_put()
into the window between the spin_unlock and the late
sock_hold(); on a system where the consumer's fd held the
only sk reference, the late sock_hold() then operates on an
sk whose refcount has reached zero.
The preceding two patches install an explicit file reference
on struct handshake_req. That file pins sock->file, which
pins the embedded struct socket, which defers inet_release()'s
sock_put(). As long as hr_file is held, sk cannot reach refcount
zero from the consumer side, and the submit-side sock_hold()
with its matching sock_put() calls in handshake_complete() and
handshake_req_cancel() is now redundant.
Drop all three. Submit's pinning is strictly the file reference
acquired earlier in the function, and the lifetime story is
contained in a single get_file()/fput() pair.
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/handshake/request.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/net/handshake/request.c b/net/handshake/request.c
index d24b55af1a0a..1c669c1602b5 100644
--- a/net/handshake/request.c
+++ b/net/handshake/request.c
@@ -301,13 +301,6 @@ int handshake_req_submit(struct socket *sock, struct handshake_req *req,
goto out_err;
}
- /*
- * Pin struct sock so sk_destruct does not run until the
- * handshake completion path releases it; struct socket is
- * held separately via hr_file above.
- */
- sock_hold(req->hr_sk);
-
trace_handshake_submit(net, req, req->hr_sk);
return 0;
@@ -335,9 +328,6 @@ void handshake_complete(struct handshake_req *req, int status,
trace_handshake_complete(net, req, sk, status);
req->hr_proto->hp_done(req, status, info);
- /* Handshake request is no longer pending */
- sock_put(sk);
-
fput(file);
}
}
@@ -385,8 +375,6 @@ bool handshake_req_cancel(struct sock *sk)
out_true:
trace_handshake_cancel(net, req, sk);
- /* Handshake request is no longer pending */
- sock_put(sk);
fput(req->hr_file);
return true;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net 7/7] net/handshake: Verify file-reference balance in submit paths
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
` (5 preceding siblings ...)
2026-05-18 18:24 ` [PATCH net 6/7] net/handshake: Close the submit-side sock_hold race Chuck Lever
@ 2026-05-18 18:24 ` Chuck Lever
2026-05-19 8:03 ` Hannes Reinecke
6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-05-18 18:24 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The new file-reference contract on struct handshake_req is silently
breakable: a missing get_file() at submit or a missing fput() on an
error path leaves the file leaked but does not crash the test, so
the existing absence-of-crash checks pass either way.
Snapshot file_count(filp) before each handshake_req_submit() in
the submit-success, EAGAIN, EBUSY, and cancel tests, and assert
the expected balance after submit and again after cancel. The
already-completed cancel test also asserts the post-complete
balance, which pins down that handshake_complete() drops the
reference and that the subsequent cancel does not double-fput.
The destroy test gets the same treatment before __fput_sync(),
which double-checks that cancel's fput() ran and the only
remaining reference is the one sock_alloc_file() established.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/handshake/handshake-test.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 9cc7a95f4120..3dd507470d5f 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -208,6 +208,7 @@ static void handshake_req_submit_test3(struct kunit *test)
static void handshake_req_submit_test4(struct kunit *test)
{
struct handshake_req *req, *result;
+ unsigned long fcount_before;
struct socket *sock;
struct file *filp;
int err;
@@ -224,8 +225,10 @@ static void handshake_req_submit_test4(struct kunit *test)
KUNIT_ASSERT_NOT_NULL(test, sock->sk);
sock->file = filp;
+ fcount_before = file_count(filp);
err = handshake_req_submit(sock, req, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, err, 0);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before + 1);
/* Act */
result = handshake_req_hash_lookup(sock->sk);
@@ -235,11 +238,13 @@ static void handshake_req_submit_test4(struct kunit *test)
KUNIT_EXPECT_PTR_EQ(test, req, result);
handshake_req_cancel(sock->sk);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
fput(filp);
}
static void handshake_req_submit_test5(struct kunit *test)
{
+ unsigned long fcount_before;
struct handshake_req *req;
struct handshake_net *hn;
struct socket *sock;
@@ -265,12 +270,14 @@ static void handshake_req_submit_test5(struct kunit *test)
saved = hn->hn_pending;
hn->hn_pending = hn->hn_pending_max + 1;
+ fcount_before = file_count(filp);
/* Act */
err = handshake_req_submit(sock, req, GFP_KERNEL);
/* Assert */
KUNIT_EXPECT_EQ(test, err, -EAGAIN);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
fput(filp);
hn->hn_pending = saved;
@@ -279,6 +286,7 @@ static void handshake_req_submit_test5(struct kunit *test)
static void handshake_req_submit_test6(struct kunit *test)
{
struct handshake_req *req1, *req2;
+ unsigned long fcount_before;
struct socket *sock;
struct file *filp;
int err;
@@ -296,21 +304,26 @@ static void handshake_req_submit_test6(struct kunit *test)
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
KUNIT_ASSERT_NOT_NULL(test, sock->sk);
sock->file = filp;
+ fcount_before = file_count(filp);
/* Act */
err = handshake_req_submit(sock, req1, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, err, 0);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before + 1);
err = handshake_req_submit(sock, req2, GFP_KERNEL);
/* Assert */
KUNIT_EXPECT_EQ(test, err, -EBUSY);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before + 1);
handshake_req_cancel(sock->sk);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
fput(filp);
}
static void handshake_req_cancel_test1(struct kunit *test)
{
+ unsigned long fcount_before;
struct handshake_req *req;
struct socket *sock;
struct file *filp;
@@ -329,8 +342,10 @@ static void handshake_req_cancel_test1(struct kunit *test)
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
sock->file = filp;
+ fcount_before = file_count(filp);
err = handshake_req_submit(sock, req, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, err, 0);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before + 1);
/* NB: handshake_req hasn't been accepted */
@@ -339,12 +354,14 @@ static void handshake_req_cancel_test1(struct kunit *test)
/* Assert */
KUNIT_EXPECT_TRUE(test, result);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
fput(filp);
}
static void handshake_req_cancel_test2(struct kunit *test)
{
+ unsigned long fcount_before;
struct handshake_req *req, *next;
struct handshake_net *hn;
struct socket *sock;
@@ -365,8 +382,10 @@ static void handshake_req_cancel_test2(struct kunit *test)
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
sock->file = filp;
+ fcount_before = file_count(filp);
err = handshake_req_submit(sock, req, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, err, 0);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before + 1);
net = sock_net(sock->sk);
hn = handshake_pernet(net);
@@ -385,12 +404,14 @@ static void handshake_req_cancel_test2(struct kunit *test)
/* Assert */
KUNIT_EXPECT_TRUE(test, result);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
fput(filp);
}
static void handshake_req_cancel_test3(struct kunit *test)
{
+ unsigned long fcount_before;
struct handshake_req *req, *next;
struct handshake_net *hn;
struct socket *sock;
@@ -411,8 +432,10 @@ static void handshake_req_cancel_test3(struct kunit *test)
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
sock->file = filp;
+ fcount_before = file_count(filp);
err = handshake_req_submit(sock, req, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, err, 0);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before + 1);
net = sock_net(sock->sk);
hn = handshake_pernet(net);
@@ -428,12 +451,14 @@ static void handshake_req_cancel_test3(struct kunit *test)
/* Pretend to complete this request */
handshake_complete(next, -ETIMEDOUT, NULL);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
/* Act */
result = handshake_req_cancel(sock->sk);
/* Assert */
KUNIT_EXPECT_FALSE(test, result);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
fput(filp);
}
@@ -454,6 +479,7 @@ static struct handshake_proto handshake_req_alloc_proto_destroy = {
static void handshake_req_destroy_test1(struct kunit *test)
{
+ unsigned long fcount_before;
struct handshake_req *req;
struct socket *sock;
struct file *filp;
@@ -473,10 +499,12 @@ static void handshake_req_destroy_test1(struct kunit *test)
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
sock->file = filp;
+ fcount_before = file_count(filp);
err = handshake_req_submit(sock, req, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, err, 0);
handshake_req_cancel(sock->sk);
+ KUNIT_EXPECT_EQ(test, file_count(filp), fcount_before);
/* Act */
/* Ensure the close/release/put process has run to
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net 1/7] net/handshake: Drain pending requests at net namespace exit
2026-05-18 18:24 ` [PATCH net 1/7] net/handshake: Drain pending requests at net namespace exit Chuck Lever
@ 2026-05-19 7:33 ` Hannes Reinecke
0 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2026-05-19 7:33 UTC (permalink / raw)
To: Chuck Lever, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On 5/18/26 20:24, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The arguments to list_splice_init() in handshake_net_exit() are
> reversed. The call moves the local empty "requests" list onto
> hn->hn_requests, leaving the local list empty, so the subsequent
> drain loop runs zero iterations. Pending handshake requests that
> had not yet been accepted are not torn down when the net namespace
> is destroyed; each one keeps a reference on a socket file and on
> the handshake_req allocation.
>
> Pass the source and destination in the documented order
> (list_splice_init(list, head) moves list onto head) so the pending
> list is transferred to the local scratch list and drained through
> handshake_complete().
>
> Fixing the splice direction exposes a list-corruption race. After
> the splice each req->hr_list still has non-empty link pointers,
> threading the stack-local scratch list rather than hn_requests.
> A concurrent handshake_req_cancel() -- for example, from sunrpc's
> TLS timeout on a kernel socket whose netns reference was not
> taken -- finds the request through the rhashtable, calls
> remove_pending(), and sees !list_empty(&req->hr_list).
> __remove_pending_locked() then list_del_init()s an entry off the
> scratch list while the drain iterates, corrupting it. The same
> call arriving after the drain loop has run list_del() on an
> entry hits LIST_POISON instead.
>
> Have remove_pending() check HANDSHAKE_F_NET_DRAINING under
> hn_lock and report not-found when drain is in progress. The
> drain has already taken ownership; handshake_complete()'s
> existing test_and_set on HANDSHAKE_F_REQ_COMPLETED still
> arbitrates between drain and cancel for who calls the consumer's
> hp_done. Use list_del_init() rather than list_del() in the drain
> so req->hr_list does not carry LIST_POISON after drain releases
> the entry.
>
> Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/handshake/netlink.c | 4 ++--
> net/handshake/request.c | 3 ++-
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err
2026-05-18 18:24 ` [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err Chuck Lever
@ 2026-05-19 7:35 ` Hannes Reinecke
2026-05-20 2:59 ` Alistair Francis
1 sibling, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2026-05-19 7:35 UTC (permalink / raw)
To: Chuck Lever, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On 5/18/26 20:24, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> nvme_tcp_tls_done() assigns queue->tls_err in three branches. The
> ENOKEY lookup failure and the EOPNOTSUPP initializer both store
> negative errnos. The third branch, reached when the handshake
> layer reports a non-zero status, stores -status.
>
> The handshake layer delivers status to the consumer callback as a
> negative errno; the other in-tree consumers --
> xs_tls_handshake_done() and the nvmet target callback -- treat
> their status argument that way. The extra negation in
> nvme_tcp_tls_done() flips the sign, leaving tls_err as a positive
> value (for instance, +EIO), which nvme_tcp_start_tls() then
> returns to its caller.
>
> Drop the extra negation so queue->tls_err uniformly carries a
> negative errno on failure.
>
> Fixes: be8e82caa685 ("nvme-tcp: enable TLS handshake upcall")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> drivers/nvme/host/tcp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete()
2026-05-18 18:24 ` [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete() Chuck Lever
@ 2026-05-19 7:47 ` Hannes Reinecke
2026-05-21 2:39 ` Jakub Kicinski
1 sibling, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2026-05-19 7:47 UTC (permalink / raw)
To: Chuck Lever, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On 5/18/26 20:24, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> handshake_complete() declares status as unsigned int and
> tls_handshake_done() negates that value (-status) before handing
> it to the TLS consumer. Consumers match on negative errno
> constants -- xs_tls_handshake_done() has
>
> switch (status) {
> case 0:
> case -EACCES:
> case -ETIMEDOUT:
> lower_transport->xprt_err = status;
> break;
> default:
> lower_transport->xprt_err = -EACCES;
> }
>
> so the API as designed expects callers to pass positive errno
> values that the tlshd shim then negates.
>
> Three internal callers in handshake_nl_accept_doit(), the
> net-exit drain, and a kunit test follow kernel convention and
> pass negative errnos -- -EIO, -ETIMEDOUT, -ETIMEDOUT. The
> implicit conversion to unsigned int turns -ETIMEDOUT into
> 0xFFFFFF92; the subsequent -status in tls_handshake_done()
> wraps back to 110, the consumer's switch falls through, and
> the xprt reports -EACCES on what should be -ETIMEDOUT or -EIO.
>
> Fix the API rather than the call sites. The natural kernel
> convention is negative errno in, negative errno out. Change
> handshake_complete() and hp_done to take int status, drop the
> negation in tls_handshake_done(), and negate once in
> handshake_nl_done_doit() where status arrives from the wire
> as an unsigned netlink attribute. The three internal callers
> were already correct under that convention and need no change.
>
> At the same wire boundary, clamp values above MAX_ERRNO to the
> default -EIO. A well-behaved tlshd never sends such a value,
> but negating an unconstrained u32 invokes signed-integer
> overflow when the high bit is set, which UBSAN would report.
>
> Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/handshake/handshake-test.c | 2 +-
> net/handshake/handshake.h | 4 ++--
> net/handshake/netlink.c | 9 +++++++--
> net/handshake/request.c | 2 +-
> net/handshake/tlshd.c | 4 ++--
> 5 files changed, 13 insertions(+), 8 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannex
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 4/7] net/handshake: Take a long-lived file reference at submit
2026-05-18 18:24 ` [PATCH net 4/7] net/handshake: Take a long-lived file reference at submit Chuck Lever
@ 2026-05-19 7:55 ` Hannes Reinecke
0 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2026-05-19 7:55 UTC (permalink / raw)
To: Chuck Lever, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On 5/18/26 20:24, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> handshake_nl_accept_doit() needs the file pointer backing
> req->hr_sk->sk_socket to survive the window between
> handshake_req_next() and the subsequent FD_PREPARE() and get_file().
> The submit-side sock_hold() does not provide that. sk_refcnt keeps
> struct sock alive, but struct socket is owned by sock->file: when
> the consumer fputs the last file reference, sock_release() tears
> the socket down regardless of any sock_hold.
>
> Prepare for a fix by adding an hr_file pointer to struct
> handshake_req and acquiring an explicit reference on sock->file
> during handshake_req_submit(). Release the reference on the
> completion-bit-winning path of handshake_complete() and
> handshake_req_cancel(), and on the submit error paths that destroy
> the request before it has been published.
>
> The accept-side dereferences are not yet retargeted; that change
> comes in the next patch.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/handshake/handshake.h | 2 ++
> net/handshake/netlink.c | 6 ------
> net/handshake/request.c | 27 ++++++++++++++++++++++++++-
> 3 files changed, 28 insertions(+), 7 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 5/7] net/handshake: hand off the pinned file reference to accept_doit
2026-05-18 18:24 ` [PATCH net 5/7] net/handshake: hand off the pinned file reference to accept_doit Chuck Lever
@ 2026-05-19 7:59 ` Hannes Reinecke
0 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2026-05-19 7:59 UTC (permalink / raw)
To: Chuck Lever, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On 5/18/26 20:24, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> handshake_req_next() removes the request from the per-net
> pending list and drops hn_lock before handshake_nl_accept_doit()
> reads req->hr_sk->sk_socket and dereferences sock->file (once in
> FD_PREPARE() and again in get_file()). In that window a
> consumer running tls_handshake_cancel() followed by sockfd_put()
> (svc_sock_free) or __fput_sync() (xs_reset_transport) releases
> sock->file. sock_release() then runs sock_orphan(), zeroing
> sk_socket, and frees the struct socket. The accept-side code
> either reads NULL through sk_socket or chases freed memory.
>
> The submit-side sock_hold() does not prevent this. sk_refcnt
> protects struct sock, but struct socket and sock->file are
> independently refcounted via the file descriptor the consumer
> owns. Pinning sk leaves sock and sock->file unprotected.
>
> Retarget the accept-side dereferences at req->hr_file, which was
> pinned at submit time, instead of req->hr_sk->sk_socket->file.
> Pinning on its own is not sufficient: a consumer that cancels
> between handshake_req_next() returning and accept_doit reaching
> FD_PREPARE() takes the !remove_pending() branch in
> handshake_req_cancel() and drops hr_file before the accept side
> takes its own reference. Hand off an additional file reference
> inside handshake_req_next(), under hn_lock, so the accept side
> operates on a reference that no concurrent handshake_req_cancel()
> can revoke. FD_PREPARE() consumes that handed-off reference,
> either by transferring it to the new fd in fd_publish() or by
> dropping it in the cleanup destructor on error; the explicit
> get_file() that previously balanced FD_PREPARE() is therefore
> redundant and goes away.
>
> Update handshake_req_cancel_test2 and _test3 to simulate the
> FD_PREPARE() consumption with an fput() so the kunit file-count
> assertions stay balanced.
>
> Reported-by: Chris Mason <clm@meta.com>
> Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/handshake/handshake-test.c | 8 ++++++++
> net/handshake/netlink.c | 7 ++-----
> net/handshake/request.c | 18 ++++++++++++++++++
> 3 files changed, 28 insertions(+), 5 deletions(-)
>
I _think_ this should fix the issue I had reported about a kernel oops
in __sys_recvfrom() in tlshd.
Which has all the symptoms described here.
So glad that it's been addressed.
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 6/7] net/handshake: Close the submit-side sock_hold race
2026-05-18 18:24 ` [PATCH net 6/7] net/handshake: Close the submit-side sock_hold race Chuck Lever
@ 2026-05-19 8:01 ` Hannes Reinecke
0 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2026-05-19 8:01 UTC (permalink / raw)
To: Chuck Lever, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On 5/18/26 20:24, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> handshake_req_submit() publishes the request via
> handshake_req_hash_add() and __add_pending_locked(), drops
> hn_lock, and calls handshake_genl_notify() (which can sleep)
> before taking sock_hold() on req->hr_sk. A fast tlshd ACCEPT
> followed by DONE can drive handshake_complete()'s sock_put()
> into the window between the spin_unlock and the late
> sock_hold(); on a system where the consumer's fd held the
> only sk reference, the late sock_hold() then operates on an
> sk whose refcount has reached zero.
>
> The preceding two patches install an explicit file reference
> on struct handshake_req. That file pins sock->file, which
> pins the embedded struct socket, which defers inet_release()'s
> sock_put(). As long as hr_file is held, sk cannot reach refcount
> zero from the consumer side, and the submit-side sock_hold()
> with its matching sock_put() calls in handshake_complete() and
> handshake_req_cancel() is now redundant.
>
> Drop all three. Submit's pinning is strictly the file reference
> acquired earlier in the function, and the lifetime story is
> contained in a single get_file()/fput() pair.
>
> Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/handshake/request.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 7/7] net/handshake: Verify file-reference balance in submit paths
2026-05-18 18:24 ` [PATCH net 7/7] net/handshake: Verify file-reference balance in submit paths Chuck Lever
@ 2026-05-19 8:03 ` Hannes Reinecke
0 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2026-05-19 8:03 UTC (permalink / raw)
To: Chuck Lever, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On 5/18/26 20:24, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The new file-reference contract on struct handshake_req is silently
> breakable: a missing get_file() at submit or a missing fput() on an
> error path leaves the file leaked but does not crash the test, so
> the existing absence-of-crash checks pass either way.
>
> Snapshot file_count(filp) before each handshake_req_submit() in
> the submit-success, EAGAIN, EBUSY, and cancel tests, and assert
> the expected balance after submit and again after cancel. The
> already-completed cancel test also asserts the post-complete
> balance, which pins down that handshake_complete() drops the
> reference and that the subsequent cancel does not double-fput.
> The destroy test gets the same treatment before __fput_sync(),
> which double-checks that cancel's fput() ran and the only
> remaining reference is the one sock_alloc_file() established.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/handshake/handshake-test.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
One wonders why this is not part of the kunit test suite.
But that's probably neither here nor there.
So:
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err
2026-05-18 18:24 ` [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err Chuck Lever
2026-05-19 7:35 ` Hannes Reinecke
@ 2026-05-20 2:59 ` Alistair Francis
1 sibling, 0 replies; 17+ messages in thread
From: Alistair Francis @ 2026-05-20 2:59 UTC (permalink / raw)
To: Chuck Lever
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Chris Mason, Christian Brauner,
kernel-tls-handshake, linux-nvme, netdev, Chuck Lever
On Tue, May 19, 2026 at 4:29 AM Chuck Lever <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> nvme_tcp_tls_done() assigns queue->tls_err in three branches. The
> ENOKEY lookup failure and the EOPNOTSUPP initializer both store
> negative errnos. The third branch, reached when the handshake
> layer reports a non-zero status, stores -status.
>
> The handshake layer delivers status to the consumer callback as a
> negative errno; the other in-tree consumers --
> xs_tls_handshake_done() and the nvmet target callback -- treat
> their status argument that way. The extra negation in
> nvme_tcp_tls_done() flips the sign, leaving tls_err as a positive
> value (for instance, +EIO), which nvme_tcp_start_tls() then
> returns to its caller.
>
> Drop the extra negation so queue->tls_err uniformly carries a
> negative errno on failure.
>
> Fixes: be8e82caa685 ("nvme-tcp: enable TLS handshake upcall")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> drivers/nvme/host/tcp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 15d36d6a728e..68a1d7640494 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1702,7 +1702,7 @@ static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid)
> qid, pskid, status);
>
> if (status) {
> - queue->tls_err = -status;
> + queue->tls_err = status;
> goto out_complete;
> }
>
>
> --
> 2.54.0
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete()
2026-05-18 18:24 ` [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete() Chuck Lever
2026-05-19 7:47 ` Hannes Reinecke
@ 2026-05-21 2:39 ` Jakub Kicinski
1 sibling, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-21 2:39 UTC (permalink / raw)
To: Chuck Lever
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Chris Mason, Christian Brauner, kernel-tls-handshake, linux-nvme,
netdev, Chuck Lever
On Mon, 18 May 2026 14:24:30 -0400 Chuck Lever wrote:
> - if (info->attrs[HANDSHAKE_A_DONE_STATUS])
> - status = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_STATUS]);
> + if (info->attrs[HANDSHAKE_A_DONE_STATUS]) {
> + u32 val = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_STATUS]);
> +
> + /* Values above MAX_ERRNO would overflow on negation. */
> + if (val <= MAX_ERRNO)
> + status = -(int)val;
> + }
Why not set a policy so that user space can't pass invalid stuff?
(note - sashiko has some more comments but AFAICT they are either
addressed throughout the series or false positives)
diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
index 95c3fade7a8d..1024297b3851 100644
--- a/Documentation/netlink/specs/handshake.yaml
+++ b/Documentation/netlink/specs/handshake.yaml
@@ -12,6 +12,12 @@ protocol: genetlink
doc: Netlink protocol to request a transport layer security handshake.
definitions:
+ -
+ type: const
+ name: max-errno
+ value: 4095
+ header: linux/err.h
+ scope: kernel
-
type: enum
name: handler-class
@@ -80,6 +86,8 @@ doc: Netlink protocol to request a transport layer security handshake.
-
name: status
type: u32
+ checks:
+ max: max-errno
-
name: sockfd
type: s32
diff --git a/net/handshake/genl.h b/net/handshake/genl.h
index 8d3e18672daf..46b65f131669 100644
--- a/net/handshake/genl.h
+++ b/net/handshake/genl.h
@@ -11,6 +11,7 @@
#include <net/genetlink.h>
#include <uapi/linux/handshake.h>
+#include <linux/err.h>
int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info);
int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/handshake/genl.c b/net/handshake/genl.c
index 870612609491..4b20cd9cdd0e 100644
--- a/net/handshake/genl.c
+++ b/net/handshake/genl.c
@@ -10,6 +10,7 @@
#include "genl.h"
#include <uapi/linux/handshake.h>
+#include <linux/err.h>
/* HANDSHAKE_CMD_ACCEPT - do */
static const struct nla_policy handshake_accept_nl_policy[HANDSHAKE_A_ACCEPT_HANDLER_CLASS + 1] = {
@@ -18,7 +19,7 @@ static const struct nla_policy handshake_accept_nl_policy[HANDSHAKE_A_ACCEPT_HAN
/* HANDSHAKE_CMD_DONE - do */
static const struct nla_policy handshake_done_nl_policy[HANDSHAKE_A_DONE_REMOTE_AUTH + 1] = {
- [HANDSHAKE_A_DONE_STATUS] = { .type = NLA_U32, },
+ [HANDSHAKE_A_DONE_STATUS] = NLA_POLICY_MAX(NLA_U32, MAX_ERRNO),
[HANDSHAKE_A_DONE_SOCKFD] = { .type = NLA_S32, },
[HANDSHAKE_A_DONE_REMOTE_AUTH] = { .type = NLA_U32, },
};
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-05-21 2:39 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 18:24 [PATCH net 0/7] net/handshake: anchor request lifetime to a pinned file reference Chuck Lever
2026-05-18 18:24 ` [PATCH net 1/7] net/handshake: Drain pending requests at net namespace exit Chuck Lever
2026-05-19 7:33 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 2/7] nvme-tcp: store negative errno in queue->tls_err Chuck Lever
2026-05-19 7:35 ` Hannes Reinecke
2026-05-20 2:59 ` Alistair Francis
2026-05-18 18:24 ` [PATCH net 3/7] net/handshake: Pass negative errno through handshake_complete() Chuck Lever
2026-05-19 7:47 ` Hannes Reinecke
2026-05-21 2:39 ` Jakub Kicinski
2026-05-18 18:24 ` [PATCH net 4/7] net/handshake: Take a long-lived file reference at submit Chuck Lever
2026-05-19 7:55 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 5/7] net/handshake: hand off the pinned file reference to accept_doit Chuck Lever
2026-05-19 7:59 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 6/7] net/handshake: Close the submit-side sock_hold race Chuck Lever
2026-05-19 8:01 ` Hannes Reinecke
2026-05-18 18:24 ` [PATCH net 7/7] net/handshake: Verify file-reference balance in submit paths Chuck Lever
2026-05-19 8:03 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox