* [PATCH 1/9] handshake: Require admin permission for DONE command
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-06 12:20 ` Jeff Layton
2026-06-05 17:34 ` [PATCH 2/9] handshake: Add tags to "done" downcall Chuck Lever
` (8 subsequent siblings)
9 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
ACCEPT and DONE are the two downcalls of the handshake genl
family, both intended for use by the trusted handshake agent
(tlshd). ACCEPT already requires GENL_ADMIN_PERM; DONE has
no privilege check at all.
The fd-lookup in handshake_nl_done_doit() only confirms that
some pending handshake request exists for the supplied sockfd;
it does not authenticate the sender. An unprivileged process
that guesses or observes a valid sockfd can therefore submit
a DONE with HANDSHAKE_A_DONE_STATUS == 0, leaving the kernel
consumer to proceed as if the handshake succeeded. A non-zero
status on a forged DONE tears down a legitimate in-flight
handshake before tlshd can report its real result.
A subsequent patch teaches the DONE handler to carry session
tags consumed for access control. That work makes closing the
existing gap a prerequisite, but the gap itself predates tags.
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Documentation/netlink/specs/handshake.yaml | 1 +
net/handshake/genl.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
index 95c3fade7a8d..24f5a0ac5920 100644
--- a/Documentation/netlink/specs/handshake.yaml
+++ b/Documentation/netlink/specs/handshake.yaml
@@ -117,6 +117,7 @@ operations:
name: done
doc: Handler reports handshake completion
attribute-set: done
+ flags: [admin-perm]
do:
request:
attributes:
diff --git a/net/handshake/genl.c b/net/handshake/genl.c
index 870612609491..791c45671cd6 100644
--- a/net/handshake/genl.c
+++ b/net/handshake/genl.c
@@ -37,7 +37,7 @@ static const struct genl_split_ops handshake_nl_ops[] = {
.doit = handshake_nl_done_doit,
.policy = handshake_done_nl_policy,
.maxattr = HANDSHAKE_A_DONE_REMOTE_AUTH,
- .flags = GENL_CMD_CAP_DO,
+ .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
},
};
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 1/9] handshake: Require admin permission for DONE command
2026-06-05 17:34 ` [PATCH 1/9] handshake: Require admin permission for DONE command Chuck Lever
@ 2026-06-06 12:20 ` Jeff Layton
0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2026-06-06 12:20 UTC (permalink / raw)
To: Chuck Lever, Donald Hunter, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Shuah Khan, Andrew Morton, John Fastabend, Sabrina Dubroca,
Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
On Fri, 2026-06-05 at 13:34 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> ACCEPT and DONE are the two downcalls of the handshake genl
> family, both intended for use by the trusted handshake agent
> (tlshd). ACCEPT already requires GENL_ADMIN_PERM; DONE has
> no privilege check at all.
>
> The fd-lookup in handshake_nl_done_doit() only confirms that
> some pending handshake request exists for the supplied sockfd;
> it does not authenticate the sender. An unprivileged process
> that guesses or observes a valid sockfd can therefore submit
> a DONE with HANDSHAKE_A_DONE_STATUS == 0, leaving the kernel
> consumer to proceed as if the handshake succeeded. A non-zero
> status on a forged DONE tears down a legitimate in-flight
> handshake before tlshd can report its real result.
>
> A subsequent patch teaches the DONE handler to carry session
> tags consumed for access control. That work makes closing the
> existing gap a prerequisite, but the gap itself predates tags.
>
> Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> Documentation/netlink/specs/handshake.yaml | 1 +
> net/handshake/genl.c | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
> index 95c3fade7a8d..24f5a0ac5920 100644
> --- a/Documentation/netlink/specs/handshake.yaml
> +++ b/Documentation/netlink/specs/handshake.yaml
> @@ -117,6 +117,7 @@ operations:
> name: done
> doc: Handler reports handshake completion
> attribute-set: done
> + flags: [admin-perm]
> do:
> request:
> attributes:
> diff --git a/net/handshake/genl.c b/net/handshake/genl.c
> index 870612609491..791c45671cd6 100644
> --- a/net/handshake/genl.c
> +++ b/net/handshake/genl.c
> @@ -37,7 +37,7 @@ static const struct genl_split_ops handshake_nl_ops[] = {
> .doit = handshake_nl_done_doit,
> .policy = handshake_done_nl_policy,
> .maxattr = HANDSHAKE_A_DONE_REMOTE_AUTH,
> - .flags = GENL_CMD_CAP_DO,
> + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
> },
> };
>
Seems like this ought to go in ahead of the rest of the set? tlshd
generally runs as root anyway so I don't forsee a problem just doing
this:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/9] handshake: Add tags to "done" downcall
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
2026-06-05 17:34 ` [PATCH 1/9] handshake: Require admin permission for DONE command Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-05 17:34 ` [PATCH 3/9] lib: Add a "tagset" data structure Chuck Lever
` (7 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
We'd like tlshd to tag certificates according to admin-defined
characteristics. The tag list is to be returned on a successful
handshake. Upper Layer Protocols (such as NFS) can then authorize
access based on the set of tags returned to the kernel.
For example, suppose NFSD wants to restrict access to an export to
only clients that present certificates whose issuer DN contains
"O=Oracle". tlshd can parse incoming certificates, and add an
"oraclegroup" tag to handshakes where a client presents a
certificate with "O=Oracle" somewhere in its Issuer field. NFSD can
then be configured to look for that tag and permit access only when
it is present. NFSD needs no knowledge of x.509 certificates.
This patch plumbs in the netlink protocol elements for tlshd to
return a list of tags to the kernel when a TLS or QUIC handshake
succeeds. Subsequent patches add tag extraction and storage in
the handshake layer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Documentation/netlink/specs/handshake.yaml | 11 +++++++++++
include/uapi/linux/handshake.h | 3 +++
net/handshake/genl.c | 5 +++--
3 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
index 24f5a0ac5920..df36ff7da18f 100644
--- a/Documentation/netlink/specs/handshake.yaml
+++ b/Documentation/netlink/specs/handshake.yaml
@@ -12,6 +12,10 @@ protocol: genetlink
doc: Netlink protocol to request a transport layer security handshake.
definitions:
+ -
+ name: session-tag-max-len
+ type: const
+ value: 255
-
type: enum
name: handler-class
@@ -87,6 +91,12 @@ attribute-sets:
name: remote-auth
type: u32
multi-attr: true
+ -
+ name: tag
+ type: string
+ checks:
+ max-len: session-tag-max-len
+ multi-attr: true
operations:
list:
@@ -124,6 +134,7 @@ operations:
- status
- sockfd
- remote-auth
+ - tag
mcast-groups:
list:
diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
index d7e40f594888..1ed309e475b4 100644
--- a/include/uapi/linux/handshake.h
+++ b/include/uapi/linux/handshake.h
@@ -10,6 +10,8 @@
#define HANDSHAKE_FAMILY_NAME "handshake"
#define HANDSHAKE_FAMILY_VERSION 1
+#define HANDSHAKE_SESSION_TAG_MAX_LEN 255
+
enum handshake_handler_class {
HANDSHAKE_HANDLER_CLASS_NONE,
HANDSHAKE_HANDLER_CLASS_TLSHD,
@@ -56,6 +58,7 @@ enum {
HANDSHAKE_A_DONE_STATUS = 1,
HANDSHAKE_A_DONE_SOCKFD,
HANDSHAKE_A_DONE_REMOTE_AUTH,
+ HANDSHAKE_A_DONE_TAG,
__HANDSHAKE_A_DONE_MAX,
HANDSHAKE_A_DONE_MAX = (__HANDSHAKE_A_DONE_MAX - 1)
diff --git a/net/handshake/genl.c b/net/handshake/genl.c
index 791c45671cd6..385583805e02 100644
--- a/net/handshake/genl.c
+++ b/net/handshake/genl.c
@@ -17,10 +17,11 @@ static const struct nla_policy handshake_accept_nl_policy[HANDSHAKE_A_ACCEPT_HAN
};
/* HANDSHAKE_CMD_DONE - do */
-static const struct nla_policy handshake_done_nl_policy[HANDSHAKE_A_DONE_REMOTE_AUTH + 1] = {
+static const struct nla_policy handshake_done_nl_policy[HANDSHAKE_A_DONE_TAG + 1] = {
[HANDSHAKE_A_DONE_STATUS] = { .type = NLA_U32, },
[HANDSHAKE_A_DONE_SOCKFD] = { .type = NLA_S32, },
[HANDSHAKE_A_DONE_REMOTE_AUTH] = { .type = NLA_U32, },
+ [HANDSHAKE_A_DONE_TAG] = { .type = NLA_STRING, .len = HANDSHAKE_SESSION_TAG_MAX_LEN, },
};
/* Ops table for handshake */
@@ -36,7 +37,7 @@ static const struct genl_split_ops handshake_nl_ops[] = {
.cmd = HANDSHAKE_CMD_DONE,
.doit = handshake_nl_done_doit,
.policy = handshake_done_nl_policy,
- .maxattr = HANDSHAKE_A_DONE_REMOTE_AUTH,
+ .maxattr = HANDSHAKE_A_DONE_TAG,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
},
};
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 3/9] lib: Add a "tagset" data structure
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
2026-06-05 17:34 ` [PATCH 1/9] handshake: Require admin permission for DONE command Chuck Lever
2026-06-05 17:34 ` [PATCH 2/9] handshake: Add tags to "done" downcall Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-05 17:34 ` [PATCH 4/9] handshake: Pick up session tags passed during the DONE downcall Chuck Lever
` (6 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Access control mechanisms sometimes need to match metadata tags
between a session and a resource. A tagset provides efficient
membership testing and set intersection operations for this purpose.
The implementation uses a sorted array of string pointers. Unlike
hash tables, sorted arrays support efficient intersection without
needing to iterate one set and probe the other. Unlike rbtrees,
they require no per-element node allocation, minimizing memory
overhead for small sets typical of resource tagging.
Operation complexities:
- tagset_add(): O(1)
- tagset_finalize(): O(N log N) for sorting and deduplication
- tagset_is_member(): O(log N) via binary search
- tagset_intersection(): O(N + M) via merge comparison
The API follows a build-then-query pattern: callers initialize,
allocate capacity, add tags, then finalize before querying. Once
finalized, the tagset is suitable for concurrent read access.
Consumers of the new APIs will be introduced in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Documentation/core-api/index.rst | 1 +
Documentation/core-api/tagset.rst | 225 ++++++++++++++++++++++++++++++++++++++
include/linux/tagset.h | 187 +++++++++++++++++++++++++++++++
lib/Makefile | 1 +
lib/tagset.c | 174 +++++++++++++++++++++++++++++
5 files changed, 588 insertions(+)
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 13769d5c40bf..d48c074e6b12 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -45,6 +45,7 @@ Library functionality that is used throughout the kernel.
idr
circular-buffers
rbtree
+ tagset
generic-radix-tree
packing
this_cpu_ops
diff --git a/Documentation/core-api/tagset.rst b/Documentation/core-api/tagset.rst
new file mode 100644
index 000000000000..2945cfa95ab4
--- /dev/null
+++ b/Documentation/core-api/tagset.rst
@@ -0,0 +1,225 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+======
+Tagset
+======
+
+Overview
+========
+
+A tagset is a set of strings, intended for resource tagging where
+metadata about a resource is represented simply by a name. The public
+API can be found in ``<linux/tagset.h>``.
+
+
+Initialization and Destruction
+==============================
+
+A tagset must be initialized before use::
+
+ struct tagset tags;
+ tagset_init(&tags);
+
+Before adding tags, allocate capacity::
+
+ if (!tagset_alloc(&tags, num_tags, GFP_KERNEL))
+ return -ENOMEM;
+
+When finished, release all resources::
+
+ tagset_destroy(&tags);
+
+This frees all tag strings and the array. The tagset is reinitialized
+and may be reused by calling ``tagset_alloc()`` again.
+
+
+Adding Tags
+===========
+
+Two functions are provided for adding tags. Both require that
+sufficient capacity has been allocated via ``tagset_alloc()``.
+All functions that allocate memory return false on failure; callers
+should check return values.
+
+``tagset_add()``
+ Adds a tag string that is already in kmalloc'd memory. On success,
+ the tagset takes ownership of the string and will free it when
+ the tagset is destroyed::
+
+ char *tag = kstrdup("mytag", GFP_KERNEL);
+ if (!tag || !tagset_add(&tags, tag))
+ kfree(tag); /* failed, caller must free */
+
+``tagset_add_dup()``
+ Duplicates the tag string internally. The caller retains ownership
+ of the original string::
+
+ if (!tagset_add_dup(&tags, "mytag", GFP_KERNEL))
+ return -ENOMEM;
+
+Both functions return true on success, false on failure. Tags must
+not be added after ``tagset_finalize()`` has been called. Use
+``GFP_ATOMIC`` when adding tags in atomic context.
+
+
+Finalizing
+==========
+
+After adding all tags, the tagset must be finalized before querying::
+
+ tagset_finalize(&tags);
+
+This sorts the array to enable efficient binary search and removes
+any duplicate tags. Calling ``tagset_is_member()`` or
+``tagset_intersection()`` on a non-finalized tagset produces
+undefined results.
+
+
+Querying Tags
+=============
+
+``tagset_is_empty()``
+ Returns true if the tagset contains no tags.
+
+``tagset_count()``
+ Returns the number of tags in the tagset. Before finalization,
+ this is the number of tags added; after finalization, this is
+ the number of unique tags.
+
+``tagset_is_member()``
+ Returns true if a tag is present in the tagset. Uses binary
+ search for O(log N) complexity::
+
+ if (tagset_is_member(&tags, "mytag"))
+ pr_info("tag found\n");
+
+``tagset_intersection()``
+ Returns true if two tagsets share at least one common tag.
+ Uses merge-style comparison for O(N+M) complexity::
+
+ if (tagset_intersection(&tags1, &tags2))
+ pr_info("sets overlap\n");
+
+
+Iteration
+=========
+
+Use ``tagset_for_each()`` to iterate over all tags::
+
+ unsigned int index;
+ char *tag;
+
+ tagset_for_each(&tags, index, tag)
+ pr_info("tag: %s\n", tag);
+
+Callers should not depend on the order in which tags are returned.
+Modifying the tagset during iteration produces undefined behavior.
+
+
+Copying
+=======
+
+``tagset_copy()`` duplicates all tags from one tagset to another::
+
+ struct tagset copy;
+ if (!tagset_copy(©, &original, GFP_KERNEL))
+ pr_err("copy failed\n");
+
+The source tagset should be finalized before copying. The destination
+tagset is initialized and ready for queries after this function
+returns (no separate ``tagset_finalize()`` call is needed). Each tag
+string is duplicated, so the two tagsets are fully independent after
+copying.
+
+
+Typical Usage Pattern
+=====================
+
+A typical usage pattern for building a tagset::
+
+ struct tagset tags;
+
+ tagset_init(&tags);
+ if (!tagset_alloc(&tags, count, GFP_KERNEL))
+ return -ENOMEM;
+
+ for (i = 0; i < count; i++) {
+ if (!tagset_add_dup(&tags, strings[i], GFP_KERNEL)) {
+ tagset_destroy(&tags);
+ return -ENOMEM;
+ }
+ }
+ tagset_finalize(&tags);
+
+ /* Now safe to query */
+ if (tagset_is_member(&tags, "target"))
+ do_something();
+
+ tagset_destroy(&tags);
+
+
+Thread Safety
+=============
+
+Tagsets have no internal locking. Callers provide synchronization
+between writers and readers.
+
+The build-and-finalize phase mutates the tagset; the post-finalize
+query phase reads from it. The two phases must be separated by a
+publication boundary, because ``tagset_finalize()`` itself carries
+no memory barrier. A reader that observes the finalized tagset
+before the writer's stores have propagated may see a stale
+``ts_count`` or a partially populated ``ts_tags[]`` array.
+
+Three publication patterns are sufficient:
+
+* Lock release after ``tagset_finalize()``, lock acquire before
+ each query. The matching unlock/lock pair supplies release and
+ acquire ordering.
+
+* ``rcu_assign_pointer()`` of the tagset pointer after
+ ``tagset_finalize()``, paired with ``rcu_dereference()`` inside
+ an RCU read-side critical section on the reader.
+
+* ``smp_store_release()`` of the tagset pointer after
+ ``tagset_finalize()``, paired with ``smp_load_acquire()`` on the
+ reader.
+
+Once published, the tagset must remain immutable until no further
+readers can observe it. ``tagset_destroy()`` is not safe against
+concurrent readers, and ``tagset_finalize()`` must not be called
+more than once. With RCU publication, callers typically defer
+destruction to a grace period (``synchronize_rcu()`` before
+``tagset_destroy()``, or ``tagset_destroy()`` from a
+``call_rcu()`` callback) so that in-flight readers drain before
+the storage is freed.
+
+
+Implementation
+==============
+
+Each tagset is rooted on the following structure::
+
+ struct tagset {
+ char **ts_tags;
+ unsigned int ts_count;
+ unsigned int ts_capacity;
+ bool ts_finalized;
+ };
+
+The implementation uses a sorted array of string pointers, providing
+O(log N) membership testing and O(N+M) intersection operations.
+
++------------------------+-------------+
+| Operation | Complexity |
++========================+=============+
+| tagset_add | O(1) |
++------------------------+-------------+
+| tagset_finalize | O(N log N) |
++------------------------+-------------+
+| tagset_is_member | O(log N) |
++------------------------+-------------+
+| tagset_intersection | O(N + M) |
++------------------------+-------------+
+| tagset_copy | O(N) |
++------------------------+-------------+
diff --git a/include/linux/tagset.h b/include/linux/tagset.h
new file mode 100644
index 000000000000..8b0a58add8fc
--- /dev/null
+++ b/include/linux/tagset.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Tagsets
+ *
+ * Copyright (c) 2025, Oracle and/or its affiliates.
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * A tagset is a set of strings. See
+ * Documentation/core-api/tagset.rst for how to use tagsets.
+ *
+ * Tagsets have no internal locking. Callers provide synchronization
+ * between writers and readers, including a release/acquire (RCU, or
+ * matching unlock/lock) publication boundary between tagset_finalize()
+ * and the first concurrent query, and a grace period (or equivalent
+ * reader drain) before tagset_destroy(). See the Thread Safety section
+ * of Documentation/core-api/tagset.rst.
+ */
+
+#ifndef _LINUX_TAGSET_H
+#define _LINUX_TAGSET_H
+
+#include <linux/bug.h>
+#include <linux/slab.h>
+
+struct tagset {
+ char **ts_tags;
+ unsigned int ts_count;
+ unsigned int ts_capacity;
+ bool ts_finalized;
+};
+
+#define TAGSET_INIT \
+ { .ts_tags = NULL, .ts_count = 0, .ts_capacity = 0, .ts_finalized = false }
+#define DEFINE_TAGSET(name) \
+ struct tagset name = TAGSET_INIT
+
+/**
+ * tagset_for_each - Iterate over items in a tagset
+ * @set: An initialized tagset containing zero or more items
+ * @index: Index counter for the iteration
+ * @tag: Tag retrieved from the tagset
+ */
+#define tagset_for_each(set, index, tag) \
+ for ((index) = 0; (index) < (set)->ts_count && \
+ ((tag) = (set)->ts_tags[index], true); (index)++)
+
+/**
+ * tagset_init - Initialize an empty tagset
+ * @set: tagset to be initialized
+ */
+static inline void tagset_init(struct tagset *set)
+{
+ set->ts_tags = NULL;
+ set->ts_count = 0;
+ set->ts_capacity = 0;
+ set->ts_finalized = false;
+}
+
+/**
+ * tagset_alloc - Pre-allocate space for tags
+ * @set: An initialized tagset
+ * @capacity: Number of tags to allocate space for
+ * @gfp: Memory allocation flags
+ *
+ * This function may only be called once per tagset. Calling it on a
+ * tagset that has already been allocated returns failure.
+ *
+ * @capacity may be zero. The tagset then represents an empty set:
+ * tagset_add() rejects further additions, and tagset_finalize(),
+ * tagset_is_member(), tagset_intersection(), and tagset_destroy()
+ * all handle it correctly. The same state is produced by
+ * DEFINE_TAGSET() and tagset_init() alone, so callers that know the
+ * set is empty may skip tagset_alloc() entirely.
+ *
+ * Return:
+ * %true: allocation succeeded (or @capacity was zero)
+ * %false: allocation failed or tagset already allocated
+ */
+static inline __must_check bool
+tagset_alloc(struct tagset *set, unsigned int capacity, gfp_t gfp)
+{
+ if (set->ts_tags)
+ return false;
+ if (capacity == 0)
+ return true;
+
+ set->ts_tags = kcalloc(capacity, sizeof(char *), gfp);
+ if (!set->ts_tags)
+ return false;
+ set->ts_capacity = capacity;
+ return true;
+}
+
+/**
+ * tagset_is_empty - Determine if a tagset contains any tags
+ * @set: An initialized tagset to be checked
+ *
+ * Return:
+ * %true: if @set is empty
+ * %false: if @set contains one or more tags
+ */
+static inline bool tagset_is_empty(const struct tagset *set)
+{
+ return set->ts_count == 0;
+}
+
+/**
+ * tagset_count - Return the number of tags in a tagset
+ * @set: An initialized tagset to be checked
+ *
+ * If called before tagset_finalize(), returns the number of tags
+ * added. If called after, returns the number of unique tags.
+ *
+ * Return:
+ * The number of tags in @set
+ */
+static inline unsigned int tagset_count(const struct tagset *set)
+{
+ return set->ts_count;
+}
+
+/**
+ * tagset_add - Add a tag to a tagset
+ * @set: An initialized tagset with available capacity
+ * @tag: non-NULL tag string to be added to @set, in kmalloc'd memory
+ *
+ * On success, @tag is now owned by @set and will be freed either
+ * by tagset_finalize() (if a content duplicate) or tagset_destroy().
+ * The tagset must have been allocated with sufficient capacity via
+ * tagset_alloc(). Tags must not be added after tagset_finalize() has
+ * been called. Callers must not hand the same pointer to tagset_add()
+ * more than once: ownership has already transferred, and a second add
+ * produces a double-free at tagset_destroy().
+ *
+ * Return:
+ * %true: @tag is now a member of @set
+ * %false: @tag could not be added (NULL or no capacity)
+ */
+static inline __must_check bool
+tagset_add(struct tagset *set, char *tag)
+{
+ if (WARN_ON_ONCE(set->ts_finalized))
+ return false;
+ if (!tag || set->ts_count >= set->ts_capacity)
+ return false;
+ set->ts_tags[set->ts_count++] = tag;
+ return true;
+}
+
+/**
+ * tagset_add_dup - Add a copy of a tag to a tagset
+ * @set: An initialized tagset with available capacity
+ * @tag: non-NULL tag string to be copied and added to @set
+ * @gfp: Memory allocation flags
+ *
+ * On success, @tag will have been copied into a kmalloc'd
+ * buffer. The caller can release @tag immediately. Tags must not
+ * be added after tagset_finalize() has been called.
+ *
+ * Return:
+ * %true: @tag is now a member of @set
+ * %false: @tag could not be added (NULL, no capacity, or ENOMEM)
+ */
+static inline __must_check bool
+tagset_add_dup(struct tagset *set, const char *tag, gfp_t gfp)
+{
+ char *entry;
+
+ if (WARN_ON_ONCE(set->ts_finalized))
+ return false;
+ if (!tag || set->ts_count >= set->ts_capacity)
+ return false;
+ entry = kstrdup(tag, gfp);
+ if (!entry)
+ return false;
+ set->ts_tags[set->ts_count++] = entry;
+ return true;
+}
+
+/* Implemented in lib/tagset.c */
+void tagset_finalize(struct tagset *set);
+void tagset_destroy(struct tagset *set);
+bool tagset_is_member(const struct tagset *set, const char *tag);
+bool tagset_copy(struct tagset *dest, const struct tagset *src, gfp_t gfp);
+bool tagset_intersection(const struct tagset *set1, const struct tagset *set2);
+
+#endif /* _LINUX_TAGSET_H */
diff --git a/lib/Makefile b/lib/Makefile
index f33a24bf1c19..4f3be192be5b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -61,6 +61,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
generic-radix-tree.o bitmap-str.o
obj-y += string_helpers.o
obj-y += hexdump.o
+obj-y += tagset.o
obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
obj-y += kstrtox.o
obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
diff --git a/lib/tagset.c b/lib/tagset.c
new file mode 100644
index 000000000000..a7e09895e370
--- /dev/null
+++ b/lib/tagset.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tagsets - a sorted set of strings
+ *
+ * Copyright (c) 2025, Oracle and/or its affiliates.
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ */
+
+#include <linux/bug.h>
+#include <linux/export.h>
+#include <linux/sort.h>
+#include <linux/tagset.h>
+
+static int tagset_cmp(const void *a, const void *b)
+{
+ const char * const *sa = a;
+ const char * const *sb = b;
+
+ return strcmp(*sa, *sb);
+}
+
+/**
+ * tagset_finalize - Sort the tagset and remove duplicates
+ * @set: An initialized tagset that has been populated
+ *
+ * This must be called after all tags have been added and before
+ * calling tagset_is_member() or tagset_intersection(). Duplicate
+ * tags are removed and their memory is freed.
+ */
+void tagset_finalize(struct tagset *set)
+{
+ unsigned int i, j;
+
+ if (WARN_ON_ONCE(set->ts_finalized))
+ return;
+ if (set->ts_count > 1) {
+ sort(set->ts_tags, set->ts_count, sizeof(char *),
+ tagset_cmp, NULL);
+
+ /* Remove duplicates in place */
+ for (i = 0, j = 1; j < set->ts_count; j++) {
+ if (strcmp(set->ts_tags[i], set->ts_tags[j]) == 0)
+ kfree(set->ts_tags[j]);
+ else
+ set->ts_tags[++i] = set->ts_tags[j];
+ }
+ set->ts_count = i + 1;
+ }
+ set->ts_finalized = true;
+}
+EXPORT_SYMBOL_GPL(tagset_finalize);
+
+/**
+ * tagset_destroy - Release tagset resources
+ * @set: tagset to be destroyed
+ */
+void tagset_destroy(struct tagset *set)
+{
+ unsigned int i;
+
+ for (i = 0; i < set->ts_count; i++)
+ kfree(set->ts_tags[i]);
+ kfree(set->ts_tags);
+ tagset_init(set);
+}
+EXPORT_SYMBOL_GPL(tagset_destroy);
+
+/**
+ * tagset_is_member - Check if a tag is already a member of a tagset
+ * @set: An initialized and finalized tagset to be checked
+ * @tag: tag string to search for
+ *
+ * Uses binary search. The tagset must have been finalized first.
+ *
+ * Return:
+ * %true: if @tag is a member of @set
+ * %false: if @tag is not a member of @set
+ */
+bool tagset_is_member(const struct tagset *set, const char *tag)
+{
+ unsigned int low = 0, high = set->ts_count;
+
+ WARN_ON_ONCE(!set->ts_finalized);
+ while (low < high) {
+ unsigned int mid = low + (high - low) / 2;
+ int cmp = strcmp(tag, set->ts_tags[mid]);
+
+ if (cmp == 0)
+ return true;
+ if (cmp < 0)
+ high = mid;
+ else
+ low = mid + 1;
+ }
+ return false;
+}
+EXPORT_SYMBOL_GPL(tagset_is_member);
+
+/**
+ * tagset_copy - Duplicate tags to another tagset
+ * @dest: An empty, initialized tagset to be filled
+ * @src: An initialized tagset to be copied from
+ * @gfp: Memory allocation flags
+ *
+ * @dest must be initialized -- via TAGSET_INIT, DEFINE_TAGSET,
+ * tagset_init(), or a prior tagset_destroy() -- and must hold no
+ * tags. Passing a populated tagset leaks its contents.
+ *
+ * On success @dest is populated and ready for use; no call to
+ * tagset_finalize() is required. On failure @dest is left as an
+ * empty, finalized tagset, so callers can safely run queries
+ * against it without first checking the return value.
+ *
+ * Return:
+ * %true: All tags in @src were copied to @dest
+ * %false: A failure occurred; @dest is empty and finalized
+ */
+bool tagset_copy(struct tagset *dest, const struct tagset *src, gfp_t gfp)
+{
+ unsigned int i;
+
+ /* src is already sorted, so dest is too */
+ dest->ts_finalized = src->ts_finalized;
+ if (src->ts_count == 0)
+ return true;
+ if (!tagset_alloc(dest, src->ts_count, gfp))
+ goto out_fail;
+ for (i = 0; i < src->ts_count; i++) {
+ char *entry = kstrdup(src->ts_tags[i], gfp);
+
+ if (!entry)
+ goto out_fail;
+ dest->ts_tags[dest->ts_count++] = entry;
+ }
+ return true;
+
+out_fail:
+ tagset_destroy(dest);
+ dest->ts_finalized = true;
+ return false;
+}
+EXPORT_SYMBOL_GPL(tagset_copy);
+
+/**
+ * tagset_intersection - Report if there are common tags
+ * @set1: An initialized and finalized tagset
+ * @set2: Another initialized and finalized tagset
+ *
+ * Uses merge-style comparison of two sorted arrays for O(N+M)
+ * complexity.
+ *
+ * Return:
+ * %true: @set1 and @set2 have at least one common tag
+ * %false: @set1 and @set2 have no tags in common
+ */
+bool tagset_intersection(const struct tagset *set1, const struct tagset *set2)
+{
+ unsigned int i = 0, j = 0;
+
+ WARN_ON_ONCE(!set1->ts_finalized);
+ WARN_ON_ONCE(!set2->ts_finalized);
+ while (i < set1->ts_count && j < set2->ts_count) {
+ int cmp = strcmp(set1->ts_tags[i], set2->ts_tags[j]);
+
+ if (cmp == 0)
+ return true;
+ if (cmp < 0)
+ i++;
+ else
+ j++;
+ }
+ return false;
+}
+EXPORT_SYMBOL_GPL(tagset_intersection);
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 4/9] handshake: Pick up session tags passed during the DONE downcall
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
` (2 preceding siblings ...)
2026-06-05 17:34 ` [PATCH 3/9] lib: Add a "tagset" data structure Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-05 17:34 ` [PATCH 5/9] handshake: Add a kunit test for the completion gate Chuck Lever
` (5 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Upper Layer Protocols such as NFS require access control based on
TLS certificate characteristics. Parsing x.509 in the kernel
duplicates work mature userspace libraries already do. The tlshd
daemon can evaluate certificates against administrator-defined
policies and assign tags indicating which policies a certificate
satisfies.
This change collects session tags from the DONE netlink downcall
and passes them to handshake consumers via their completion
callback. Consumers can then make authorization decisions based on
tag membership without certificate parsing. For example, NFSD can
restrict export access to connections whose certificate earned a
particular tag, delegating certificate policy to tlshd
configuration.
The tagset is populated by iterating HANDSHAKE_A_DONE_TAG attributes
from the netlink message and finalized before delivery to consumers.
The genl family runs DONE with parallel_ops, so duplicate or
concurrent DONE downcalls for the same socket can both reach the
same handshake_req. Each handler collects into a private local
tagset, takes a one-completion gate, and on winning publishes the
set into req->hr_tags by struct assignment. Keeping the gate
region free of GFP_KERNEL allocations preserves the cancel
contract: handshake_req_cancel() returning false means the
consumer callback has run or is about to, with no sleeping work
remaining. Otherwise a concurrent svc_sock_free() observing the
gate as taken would free callback data while the delayed callback
is still pending. Split handshake_complete() into a try/finish
pair so the netlink handler can collect tags before the gate and
publish them after winning, refusing duplicates with -EBUSY.
When the count exceeds HANDSHAKE_MAX_SESSIONTAGS, the handler
truncates the list, logs a single pr_warn_once(), and returns
success. Failing the DONE with -E2BIG would signal the daemon
definitively but would also tear down a handshake the operator
almost certainly wants to keep; the trade-off favors session
continuity and treats overrun as a misconfiguration to fix in
tlshd. A subsequent patch advertises the cap on every ACCEPT
reply so tlshd can avoid overrunning in the first place.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Documentation/networking/tls-handshake.rst | 56 ++++++++++++++-
drivers/nvme/host/tcp.c | 3 +-
drivers/nvme/target/tcp.c | 3 +-
include/net/handshake.h | 26 ++++++-
net/handshake/handshake.h | 6 ++
net/handshake/netlink.c | 109 ++++++++++++++++++++++++++++-
net/handshake/request.c | 66 ++++++++++++++---
net/handshake/tlshd.c | 5 +-
net/sunrpc/svcsock.c | 5 +-
net/sunrpc/xprtsock.c | 5 +-
10 files changed, 266 insertions(+), 18 deletions(-)
diff --git a/Documentation/networking/tls-handshake.rst b/Documentation/networking/tls-handshake.rst
index 4f7bc1087df9..352842a74e6b 100644
--- a/Documentation/networking/tls-handshake.rst
+++ b/Documentation/networking/tls-handshake.rst
@@ -169,7 +169,8 @@ The synopsis of this function is:
.. code-block:: c
typedef void (*tls_done_func_t)(void *data, int status,
- key_serial_t peerid);
+ key_serial_t peerid,
+ const struct tagset *tags);
The consumer provides a cookie in the @ta_data field of the
tls_handshake_args structure that is returned in the @data parameter of
@@ -200,6 +201,10 @@ The @peerid parameter contains the serial number of a key containing the
remote peer's identity or the value TLS_NO_PEERID if the session is not
authenticated.
+The @tags parameter points to a tagset containing session metadata
+assigned by the handshake agent. See the "Session Tags" section
+below for details on lifetime and safe access patterns.
+
A best practice is to close and destroy the socket immediately if the
handshake failed.
@@ -220,3 +225,52 @@ received message data is TLS record data or session metadata.
See tls.rst for details on how a kTLS consumer recognizes incoming
(decrypted) application data, alerts, and handshake packets once the
socket has been promoted to use the TLS ULP.
+
+
+Session Tags
+============
+
+When a TLS handshake completes successfully, the handshake agent may
+assign metadata tags to the session. These tags enable kernel consumers
+to make authorization decisions based on certificate characteristics
+without parsing x.509 certificates directly.
+
+The handshake agent evaluates the peer's certificate against
+administrator-defined policies and assigns tags indicating which
+policies the certificate satisfies. For example, an administrator
+might configure the agent to assign an "internal-servers" tag when a
+certificate's Issuer DN matches a particular corporate CA.
+
+Tags are delivered to the consumer via the @tags parameter of the
+completion callback. The tagset is valid only for the duration of the
+callback. Consumers needing persistent access must copy the tagset
+using tagset_copy() before returning.
+
+To check whether a session has a particular tag:
+
+.. code-block:: c
+
+ if (tagset_is_member(tags, "internal-servers")) {
+ /* Certificate matched the internal-servers policy */
+ }
+
+To check whether a session has any tag from a small fixed set:
+
+.. code-block:: c
+
+ if (tagset_is_member(tags, "admin") ||
+ tagset_is_member(tags, "operator")) {
+ /* Certificate matched admin or operator policy */
+ }
+
+When the required set is dynamic (e.g., parsed from an export option),
+construct a tagset and use tagset_intersection() to test for any
+overlap. See Documentation/core-api/tagset.rst for the
+initialization, add, and finalize sequence.
+
+If the handshake failed or no tags were assigned, the tagset is
+empty. The handshake layer always delivers a finalized tagset to
+the callback, so consumers may call tagset_is_member() and
+tagset_intersection() unconditionally without a separate guard.
+
+See Documentation/core-api/tagset.rst for the complete tagset API.
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 15d36d6a728e..d08c03f89661 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1691,7 +1691,8 @@ static void nvme_tcp_set_queue_io_cpu(struct nvme_tcp_queue *queue)
qid, queue->io_cpu);
}
-static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid)
+static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid,
+ const struct tagset *tags)
{
struct nvme_tcp_queue *queue = data;
struct nvme_tcp_ctrl *ctrl = queue->ctrl;
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 20f150d17a96..7e0132b8ac93 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1821,7 +1821,8 @@ static int nvmet_tcp_tls_key_lookup(struct nvmet_tcp_queue *queue,
}
static void nvmet_tcp_tls_handshake_done(void *data, int status,
- key_serial_t peerid)
+ key_serial_t peerid,
+ const struct tagset *tags)
{
struct nvmet_tcp_queue *queue = data;
diff --git a/include/net/handshake.h b/include/net/handshake.h
index 8ebd4f9ed26e..fa43b108c2a8 100644
--- a/include/net/handshake.h
+++ b/include/net/handshake.h
@@ -10,6 +10,14 @@
#ifndef _NET_HANDSHAKE_H
#define _NET_HANDSHAKE_H
+#include <linux/tagset.h>
+
+/*
+ * Per-handshake cap on session tags. Bounds the cost of
+ * tagset_intersection() in consumer authorization checks.
+ */
+#define HANDSHAKE_MAX_SESSIONTAGS 64
+
enum {
TLS_NO_KEYRING = 0,
TLS_NO_PEERID = 0,
@@ -17,8 +25,24 @@ enum {
TLS_NO_PRIVKEY = 0,
};
+/**
+ * typedef tls_done_func_t - TLS handshake completion callback
+ * @data: opaque context pointer set via tls_handshake_args.ta_data
+ * @status: zero on success, otherwise a negative errno
+ * @peerid: serial number of peer identity key, or TLS_NO_PEERID
+ * @tags: session tags assigned by the handshake agent
+ *
+ * Invoked when a TLS handshake completes, either successfully or with
+ * an error. The @tags parameter points to session metadata assigned
+ * by the handshake agent based on certificate policy evaluation. The
+ * tagset is empty when the handshake failed or no policies matched.
+ *
+ * The @tags pointer is valid only for the duration of this callback.
+ * Callers requiring persistent access must copy via tagset_copy().
+ */
typedef void (*tls_done_func_t)(void *data, int status,
- key_serial_t peerid);
+ key_serial_t peerid,
+ const struct tagset *tags);
struct tls_handshake_args {
struct socket *ta_sock;
diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
index a48163765a7a..3b32c7971682 100644
--- a/net/handshake/handshake.h
+++ b/net/handshake/handshake.h
@@ -10,6 +10,8 @@
#ifndef _INTERNAL_HANDSHAKE_H
#define _INTERNAL_HANDSHAKE_H
+#include <linux/tagset.h>
+
/* Per-net namespace context */
struct handshake_net {
spinlock_t hn_lock; /* protects next 3 fields */
@@ -34,6 +36,7 @@ struct handshake_req {
const struct handshake_proto *hr_proto;
struct sock *hr_sk;
void (*hr_odestruct)(struct sock *sk);
+ struct tagset hr_tags;
/* Always the last field */
char hr_priv[];
@@ -86,6 +89,9 @@ struct handshake_req *handshake_req_hash_lookup(struct sock *sk);
struct handshake_req *handshake_req_next(struct handshake_net *hn, int class);
int handshake_req_submit(struct socket *sock, struct handshake_req *req,
gfp_t flags);
+bool handshake_try_complete(struct handshake_req *req);
+void handshake_finish_complete(struct handshake_req *req, unsigned int status,
+ struct genl_info *info);
void handshake_complete(struct handshake_req *req, unsigned int status,
struct genl_info *info);
bool handshake_req_cancel(struct sock *sk);
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
index b989456fc4c5..0c2e68360a73 100644
--- a/net/handshake/netlink.c
+++ b/net/handshake/netlink.c
@@ -16,6 +16,7 @@
#include <net/sock.h>
#include <net/genetlink.h>
+#include <net/handshake.h>
#include <net/netns/generic.h>
#include <kunit/visibility.h>
@@ -133,11 +134,83 @@ int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info)
return err;
}
+/*
+ * Pick up session tags from the DONE downcall payload into a
+ * caller-owned tagset. No handshake_req fields are mutated here:
+ * concurrent DONE handlers each populate a private tagset, and
+ * the winner of the completion gate publishes its set into
+ * req->hr_tags by struct assignment.
+ *
+ * Return: 0 if tags were processed (some may have been dropped on
+ * per-tag or bulk allocation pressure, or truncated at
+ * HANDSHAKE_MAX_SESSIONTAGS); a negative errno if the payload was
+ * rejected and no tags collected.
+ */
+static int handshake_get_sessiontags(struct tagset *tags,
+ struct genl_info *info)
+{
+ unsigned int count = 0;
+ struct nlattr *nla;
+ int rem;
+
+ /*
+ * Reject embedded NUL bytes only. NLA_STRING payloads may
+ * arrive with or without a trailing NUL, and nla_strdup()
+ * appends the terminator when copying into the tagset.
+ * NLA_NUL_STRING would accept a NUL at any offset, and the
+ * YAML schema cannot express "no NUL except as terminator,"
+ * so the check belongs here.
+ */
+ nlmsg_for_each_attr_type(nla, HANDSHAKE_A_DONE_TAG, info->nlhdr,
+ GENL_HDRLEN, rem) {
+ const char *src = nla_data(nla);
+ size_t srclen = nla_len(nla);
+
+ if (srclen > 0 && src[srclen - 1] == '\0')
+ srclen--;
+ if (srclen == 0 || memchr(src, '\0', srclen))
+ return -EINVAL;
+ count++;
+ }
+ if (count == 0)
+ return 0;
+ if (count > HANDSHAKE_MAX_SESSIONTAGS) {
+ pr_warn_once("handshake: too many session tags (%u > %u)\n",
+ count, HANDSHAKE_MAX_SESSIONTAGS);
+ count = HANDSHAKE_MAX_SESSIONTAGS;
+ }
+ if (!tagset_alloc(tags, count, GFP_KERNEL)) {
+ pr_warn_once("handshake: dropping session tags under memory pressure\n");
+ return 0;
+ }
+
+ nlmsg_for_each_attr_type(nla, HANDSHAKE_A_DONE_TAG, info->nlhdr,
+ GENL_HDRLEN, rem) {
+ char *tag;
+
+ /*
+ * The first pass may have clamped count to
+ * HANDSHAKE_MAX_SESSIONTAGS. Stop here to avoid
+ * alloc/free churn on excess attributes.
+ */
+ if (tagset_count(tags) >= count)
+ break;
+
+ tag = nla_strdup(nla, GFP_KERNEL);
+ if (!tag)
+ continue;
+ if (!tagset_add(tags, tag))
+ kfree(tag);
+ }
+ return 0;
+}
+
int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info)
{
struct net *net = sock_net(skb->sk);
struct handshake_req *req;
struct socket *sock;
+ DEFINE_TAGSET(tags);
int fd, status, err;
if (GENL_REQ_ATTR_CHECK(info, HANDSHAKE_A_DONE_SOCKFD))
@@ -161,10 +234,42 @@ int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info)
status = -EIO;
if (info->attrs[HANDSHAKE_A_DONE_STATUS])
status = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_STATUS]);
+ err = 0;
+ if (!status) {
+ int ret = handshake_get_sessiontags(&tags, info);
- handshake_complete(req, status, info);
+ if (ret < 0) {
+ err = ret;
+ trace_handshake_cmd_done_err(net, req, sock->sk, err);
+ status = -EIO;
+ }
+ }
+
+ /*
+ * Take the unique-completer gate after collection so the gate
+ * region contains no GFP_KERNEL allocations. handshake_req_cancel()
+ * observers must not see the gate as taken while sleeping work
+ * remains here, or they will free callback data while the consumer
+ * callback is still pending.
+ */
+ if (!handshake_try_complete(req)) {
+ trace_handshake_cmd_done_err(net, req, sock->sk, -EBUSY);
+ tagset_destroy(&tags);
+ sockfd_put(sock);
+ return -EBUSY;
+ }
+
+ /*
+ * Publish the locally-collected tagset. req->hr_tags was
+ * initialized empty by handshake_req_alloc() and no other writer
+ * can reach it past the gate, so a struct assignment cleanly
+ * transfers ownership of the heap-allocated tag array.
+ */
+ req->hr_tags = tags;
+
+ handshake_finish_complete(req, status, info);
sockfd_put(sock);
- return 0;
+ return err;
}
static unsigned int handshake_net_id;
diff --git a/net/handshake/request.c b/net/handshake/request.c
index 2829adbeb149..2215a9916727 100644
--- a/net/handshake/request.c
+++ b/net/handshake/request.c
@@ -79,6 +79,7 @@ static void handshake_req_destroy(struct handshake_req *req)
req->hr_proto->hp_destroy(req);
rhashtable_remove_fast(&handshake_rhashtbl, &req->hr_rhash,
handshake_rhash_params);
+ tagset_destroy(&req->hr_tags);
kfree(req);
}
@@ -124,6 +125,7 @@ struct handshake_req *handshake_req_alloc(const struct handshake_proto *proto,
return NULL;
INIT_LIST_HEAD(&req->hr_list);
+ tagset_init(&req->hr_tags);
req->hr_proto = proto;
return req;
}
@@ -284,19 +286,67 @@ int handshake_req_submit(struct socket *sock, struct handshake_req *req,
}
EXPORT_SYMBOL(handshake_req_submit);
-void handshake_complete(struct handshake_req *req, unsigned int status,
- struct genl_info *info)
+/**
+ * handshake_try_complete - Take the unique-completer gate
+ * @req: handshake request being completed
+ *
+ * The DONE netlink op runs with parallel_ops, so duplicate or
+ * concurrent DONE downcalls for the same socket can both reach
+ * the same @req. The gate ensures that exactly one caller drives
+ * completion to the consumer.
+ *
+ * The gate is also observable via handshake_req_cancel(): once
+ * taken, cancel returns %false to indicate completion is in
+ * flight. Callers must therefore not perform sleeping work
+ * between handshake_try_complete() and handshake_finish_complete(),
+ * or a concurrent cancel will see the gate taken while the
+ * consumer callback has not yet run, and may free callback data
+ * out from under it.
+ *
+ * Return: %true if the caller has won the gate and is now
+ * responsible for calling handshake_finish_complete(); %false
+ * otherwise.
+ */
+bool handshake_try_complete(struct handshake_req *req)
+{
+ return !test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED, &req->hr_flags);
+}
+
+/**
+ * handshake_finish_complete - Deliver completion to the consumer
+ * @req: handshake request being completed
+ * @status: completion status to deliver
+ * @info: netlink message context, or NULL
+ *
+ * Caller must have won the gate via handshake_try_complete().
+ * Finalizes hr_tags, invokes the consumer's done callback, and
+ * drops the sock reference taken at submit.
+ */
+void handshake_finish_complete(struct handshake_req *req, unsigned int status,
+ struct genl_info *info)
{
struct sock *sk = req->hr_sk;
struct net *net = sock_net(sk);
- if (!test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED, &req->hr_flags)) {
- trace_handshake_complete(net, req, sk, status);
- req->hr_proto->hp_done(req, status, info);
+ trace_handshake_complete(net, req, sk, status);
+ /*
+ * Finalize unconditionally so consumers may call
+ * tagset_is_member() and tagset_intersection() without
+ * tripping the !ts_finalized WARN on paths where no DONE
+ * tags were collected.
+ */
+ tagset_finalize(&req->hr_tags);
+ req->hr_proto->hp_done(req, status, info);
- /* Handshake request is no longer pending */
- sock_put(sk);
- }
+ /* Handshake request is no longer pending */
+ sock_put(sk);
+}
+
+void handshake_complete(struct handshake_req *req, unsigned int status,
+ struct genl_info *info)
+{
+ if (handshake_try_complete(req))
+ handshake_finish_complete(req, status, info);
}
EXPORT_SYMBOL_IF_KUNIT(handshake_complete);
diff --git a/net/handshake/tlshd.c b/net/handshake/tlshd.c
index 8f9532a15f43..9bcaeba74f8c 100644
--- a/net/handshake/tlshd.c
+++ b/net/handshake/tlshd.c
@@ -26,7 +26,8 @@
struct tls_handshake_req {
void (*th_consumer_done)(void *data, int status,
- key_serial_t peerid);
+ key_serial_t peerid,
+ const struct tagset *tags);
void *th_consumer_data;
int th_type;
@@ -105,7 +106,7 @@ static void tls_handshake_done(struct handshake_req *req,
set_bit(HANDSHAKE_F_REQ_SESSION, &req->hr_flags);
treq->th_consumer_done(treq->th_consumer_data, -status,
- treq->th_peerid[0]);
+ treq->th_peerid[0], &req->hr_tags);
}
#if IS_ENABLED(CONFIG_KEYS)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 50e5e7f5b762..b4ad84910687 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -453,13 +453,16 @@ static void svc_tcp_kill_temp_xprt(struct svc_xprt *xprt)
* @data: address of xprt to wake
* @status: status of handshake
* @peerid: serial number of key containing the remote peer's identity
+ * @tags: session tags assigned by the handshake agent; valid only for
+ * the duration of this callback
*
* If a security policy is specified as an export option, we don't
* have a specific export here to check. So we set a "TLS session
* is present" flag on the xprt and let an upper layer enforce local
* security policy.
*/
-static void svc_tcp_handshake_done(void *data, int status, key_serial_t peerid)
+static void svc_tcp_handshake_done(void *data, int status, key_serial_t peerid,
+ const struct tagset *tags)
{
struct svc_xprt *xprt = data;
struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 2e1fe6013361..b5e88ad64d63 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2590,9 +2590,12 @@ static int xs_tcp_tls_finish_connecting(struct rpc_xprt *lower_xprt,
* @data: address of xprt to wake
* @status: status of handshake
* @peerid: serial number of key containing the remote's identity
+ * @tags: session tags assigned by the handshake agent; valid only for
+ * the duration of this callback
*
*/
-static void xs_tls_handshake_done(void *data, int status, key_serial_t peerid)
+static void xs_tls_handshake_done(void *data, int status, key_serial_t peerid,
+ const struct tagset *tags)
{
struct rpc_xprt *lower_xprt = data;
struct sock_xprt *lower_transport =
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 5/9] handshake: Add a kunit test for the completion gate
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
` (3 preceding siblings ...)
2026-06-05 17:34 ` [PATCH 4/9] handshake: Pick up session tags passed during the DONE downcall Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-05 17:34 ` [PATCH 6/9] handshake: advertise the session-tag cap to user space Chuck Lever
` (4 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The DONE netlink op runs with parallel_ops, so duplicate or
concurrent DONE downcalls for the same socket can both reach
the same handshake_req. The split try/finish pair guarantees
that exactly one caller drives completion to the consumer's
hp_done callback; this invariant lets the session-tag publish
step transfer ownership safely.
Exercise the gate's idempotency with a kunit case: submit and
accept a request, call handshake_try_complete() twice in
sequence and assert the second returns false, drive
handshake_finish_complete() once, then call handshake_complete()
again and confirm hp_done fired exactly once across the
sequence.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/handshake/handshake-test.c | 72 ++++++++++++++++++++++++++++++++++++++++++
net/handshake/request.c | 2 ++
2 files changed, 74 insertions(+)
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 55442b2f518a..c172b7a9750f 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -430,6 +430,74 @@ static void handshake_req_cancel_test3(struct kunit *test)
fput(filp);
}
+static int handshake_try_complete_done_count;
+
+static void test_done_count_func(struct handshake_req *req, unsigned int status,
+ struct genl_info *info)
+{
+ handshake_try_complete_done_count++;
+}
+
+static struct handshake_proto handshake_req_alloc_proto_count = {
+ .hp_handler_class = HANDSHAKE_HANDLER_CLASS_TLSHD,
+ .hp_accept = test_accept_func,
+ .hp_done = test_done_count_func,
+};
+
+static void handshake_try_complete_test1(struct kunit *test)
+{
+ struct handshake_req *req, *next;
+ struct handshake_net *hn;
+ struct socket *sock;
+ struct file *filp;
+ struct net *net;
+ bool first, second;
+ int err;
+
+ /* Arrange */
+ handshake_try_complete_done_count = 0;
+
+ req = handshake_req_alloc(&handshake_req_alloc_proto_count, GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, req);
+
+ err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
+ &sock, 1);
+ KUNIT_ASSERT_EQ(test, err, 0);
+
+ filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
+ KUNIT_ASSERT_NOT_NULL(test, sock->sk);
+ sock->file = filp;
+
+ err = handshake_req_submit(sock, req, GFP_KERNEL);
+ KUNIT_ASSERT_EQ(test, err, 0);
+
+ net = sock_net(sock->sk);
+ hn = handshake_pernet(net);
+ KUNIT_ASSERT_NOT_NULL(test, hn);
+
+ /* Pretend to accept this request */
+ next = handshake_req_next(hn, HANDSHAKE_HANDLER_CLASS_TLSHD);
+ KUNIT_ASSERT_PTR_EQ(test, req, next);
+
+ /* Act */
+ first = handshake_try_complete(req);
+ second = handshake_try_complete(req);
+ handshake_finish_complete(req, 0, NULL);
+ /* handshake_complete() re-enters the gate via
+ * handshake_try_complete(). With the gate already taken,
+ * hp_done must not fire a second time.
+ */
+ handshake_complete(req, 0, NULL);
+
+ /* Assert */
+ KUNIT_EXPECT_TRUE(test, first);
+ KUNIT_EXPECT_FALSE(test, second);
+ KUNIT_EXPECT_EQ(test, handshake_try_complete_done_count, 1);
+
+ fput(filp);
+}
+
static struct handshake_req *handshake_req_destroy_test;
static void test_destroy_func(struct handshake_req *req)
@@ -522,6 +590,10 @@ static struct kunit_case handshake_api_test_cases[] = {
.name = "req_cancel after done",
.run_case = handshake_req_cancel_test3,
},
+ {
+ .name = "try_complete gate is exclusive",
+ .run_case = handshake_try_complete_test1,
+ },
{
.name = "req_destroy works",
.run_case = handshake_req_destroy_test1,
diff --git a/net/handshake/request.c b/net/handshake/request.c
index 2215a9916727..96c27efa9958 100644
--- a/net/handshake/request.c
+++ b/net/handshake/request.c
@@ -311,6 +311,7 @@ bool handshake_try_complete(struct handshake_req *req)
{
return !test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED, &req->hr_flags);
}
+EXPORT_SYMBOL_IF_KUNIT(handshake_try_complete);
/**
* handshake_finish_complete - Deliver completion to the consumer
@@ -341,6 +342,7 @@ void handshake_finish_complete(struct handshake_req *req, unsigned int status,
/* Handshake request is no longer pending */
sock_put(sk);
}
+EXPORT_SYMBOL_IF_KUNIT(handshake_finish_complete);
void handshake_complete(struct handshake_req *req, unsigned int status,
struct genl_info *info)
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 6/9] handshake: advertise the session-tag cap to user space
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
` (4 preceding siblings ...)
2026-06-05 17:34 ` [PATCH 5/9] handshake: Add a kunit test for the completion gate Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-05 17:34 ` [PATCH 7/9] SUNRPC: Copy the TLS session tags when they are available Chuck Lever
` (3 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The kernel caps the number of session tags it accepts in a DONE
downcall at HANDSHAKE_MAX_SESSIONTAGS. tlshd has no way to learn
this cap today: a daemon built against newer UAPI headers than
the running kernel silently overruns it, and the kernel truncates
the list with one pr_warn_once per boot. Truncation is
recoverable but the underlying misconfiguration is easy to miss.
Carry the cap on every ACCEPT reply as HANDSHAKE_A_ACCEPT_MAX_TAGS,
a u32 attribute populated by the kernel. User space reads the
value at ACCEPT time and can gate its DONE-side tag list against
it, turning over-cap into a user-space policy choice rather than
a silent kernel-side truncation.
Putting the cap in the ACCEPT reply keeps a single source of
truth and lets the kernel raise it in a later release without
bumping the daemon's UAPI header dependency.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Documentation/netlink/specs/handshake.yaml | 4 ++++
Documentation/networking/tls-handshake.rst | 7 +++++++
include/uapi/linux/handshake.h | 1 +
net/handshake/tlshd.c | 5 +++++
4 files changed, 17 insertions(+)
diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
index df36ff7da18f..614d31bee656 100644
--- a/Documentation/netlink/specs/handshake.yaml
+++ b/Documentation/netlink/specs/handshake.yaml
@@ -78,6 +78,9 @@ attribute-sets:
-
name: keyring
type: u32
+ -
+ name: max-tags
+ type: u32
-
name: done
attributes:
@@ -123,6 +126,7 @@ operations:
- certificate
- peername
- keyring
+ - max-tags
-
name: done
doc: Handler reports handshake completion
diff --git a/Documentation/networking/tls-handshake.rst b/Documentation/networking/tls-handshake.rst
index 352842a74e6b..ea2e090a1ed8 100644
--- a/Documentation/networking/tls-handshake.rst
+++ b/Documentation/networking/tls-handshake.rst
@@ -273,4 +273,11 @@ empty. The handshake layer always delivers a finalized tagset to
the callback, so consumers may call tagset_is_member() and
tagset_intersection() unconditionally without a separate guard.
+The tagset delivered to the consumer may contain fewer tags than
+the handshake agent assigned. The kernel caps the per-DONE tag
+count at HANDSHAKE_MAX_SESSIONTAGS, and individual tags within
+the cap may be dropped under memory pressure. The cap rides on
+every ACCEPT reply so the agent can size its DONE-side tag list
+to it; see Documentation/netlink/specs/handshake.yaml.
+
See Documentation/core-api/tagset.rst for the complete tagset API.
diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
index 1ed309e475b4..1445983e7369 100644
--- a/include/uapi/linux/handshake.h
+++ b/include/uapi/linux/handshake.h
@@ -49,6 +49,7 @@ enum {
HANDSHAKE_A_ACCEPT_CERTIFICATE,
HANDSHAKE_A_ACCEPT_PEERNAME,
HANDSHAKE_A_ACCEPT_KEYRING,
+ HANDSHAKE_A_ACCEPT_MAX_TAGS,
__HANDSHAKE_A_ACCEPT_MAX,
HANDSHAKE_A_ACCEPT_MAX = (__HANDSHAKE_A_ACCEPT_MAX - 1)
diff --git a/net/handshake/tlshd.c b/net/handshake/tlshd.c
index 9bcaeba74f8c..eae4a4a0a9ef 100644
--- a/net/handshake/tlshd.c
+++ b/net/handshake/tlshd.c
@@ -238,6 +238,11 @@ static int tls_handshake_accept(struct handshake_req *req,
goto out_cancel;
}
+ ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MAX_TAGS,
+ HANDSHAKE_MAX_SESSIONTAGS);
+ if (ret < 0)
+ goto out_cancel;
+
ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_AUTH_MODE,
treq->th_auth_mode);
if (ret < 0)
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 7/9] SUNRPC: Copy the TLS session tags when they are available
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
` (5 preceding siblings ...)
2026-06-05 17:34 ` [PATCH 6/9] handshake: advertise the session-tag cap to user space Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-05 17:34 ` [PATCH 8/9] NFSD: Implement export tagging Chuck Lever
` (2 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
When a server handshake completes successfully, tlshd might provide
a set of TLS session tags. SUNRPC can save these within the svc_xprt;
NFSD can later use them to authorize or reject operations that target
NFS exports that have a similar set of tags associated with them.
A second handshake on the same transport would destroy the saved
tags while other workers read them, so svcauth_tls_accept() now
refuses AUTH_TLS on a transport that already carries a TLS session,
and svc_tcp_handshake() rechecks the session flag under XPT_BUSY to
close the race with a handshake that completes concurrently.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc_xprt.h | 2 ++
net/sunrpc/svc_xprt.c | 11 ++++++++---
net/sunrpc/svcauth_unix.c | 12 ++++++++++++
net/sunrpc/svcsock.c | 33 ++++++++++++++++++++++++++++++++-
4 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index da2a2531e110..15f678d00876 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -9,6 +9,7 @@
#define SUNRPC_SVC_XPRT_H
#include <linux/sunrpc/svc.h>
+#include <linux/tagset.h>
struct module;
@@ -79,6 +80,7 @@ struct svc_xprt {
const struct cred *xpt_cred;
struct rpc_xprt *xpt_bc_xprt; /* NFSv4.1 backchannel */
struct rpc_xprt_switch *xpt_bc_xps; /* NFSv4.1 backchannel */
+ struct tagset xpt_handshake_tags; /* TLS session tags */
};
/* flag bits for xpt_flags */
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 63d1002e63e7..1638fc09db8b 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -168,6 +168,8 @@ static void svc_xprt_free(struct kref *kref)
struct svc_xprt *xprt =
container_of(kref, struct svc_xprt, xpt_ref);
struct module *owner = xprt->xpt_class->xcl_owner;
+
+ tagset_destroy(&xprt->xpt_handshake_tags);
if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
svcauth_unix_info_release(xprt);
put_cred(xprt->xpt_cred);
@@ -188,9 +190,12 @@ void svc_xprt_put(struct svc_xprt *xprt)
}
EXPORT_SYMBOL_GPL(svc_xprt_put);
-/*
- * Called by transport drivers to initialize the transport independent
- * portion of the transport instance.
+/**
+ * svc_xprt_init - initialize transport-independent fields of an xprt
+ * @net: Network namespace
+ * @xcl: Transport class
+ * @xprt: Transport to be initialized
+ * @serv: RPC service
*/
void svc_xprt_init(struct net *net, struct svc_xprt_class *xcl,
struct svc_xprt *xprt, struct svc_serv *serv)
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 64a2658faddb..7a779e773107 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -1129,6 +1129,18 @@ svcauth_tls_accept(struct svc_rqst *rqstp)
return SVC_DENIED;
}
+ /*
+ * AUTH_TLS initiates a handshake. Refuse it on a transport
+ * that already has a TLS session: a second handshake would
+ * destroy xpt_handshake_tags. This test can pass before a
+ * concurrent handshake completes; svc_tcp_handshake()
+ * rechecks under XPT_BUSY before destroying the tags.
+ */
+ if (test_bit(XPT_TLS_SESSION, &xprt->xpt_flags)) {
+ rqstp->rq_auth_stat = rpc_autherr_badcred;
+ return SVC_DENIED;
+ }
+
/* Signal that mapping to nobody uid/gid is required */
cred->cr_uid = INVALID_UID;
cred->cr_gid = INVALID_GID;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index b4ad84910687..cc06ed3075db 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -470,7 +470,18 @@ static void svc_tcp_handshake_done(void *data, int status, key_serial_t peerid,
if (!status) {
if (peerid != TLS_NO_PEERID)
set_bit(XPT_PEER_AUTH, &xprt->xpt_flags);
- set_bit(XPT_TLS_SESSION, &xprt->xpt_flags);
+ /*
+ * Leaving XPT_TLS_SESSION clear on copy failure makes
+ * svc_tcp_handshake() close the connection. The tags
+ * cannot be recovered later on this transport because
+ * a second handshake is refused once a session is
+ * established; a reconnect retries both the handshake
+ * and the copy.
+ */
+ if (tagset_copy(&xprt->xpt_handshake_tags, tags, GFP_KERNEL))
+ set_bit(XPT_TLS_SESSION, &xprt->xpt_flags);
+ else
+ pr_warn_ratelimited("svc: failed to copy TLS session tags\n");
}
clear_bit(XPT_HANDSHAKE, &xprt->xpt_flags);
complete_all(&svsk->sk_handshake_done);
@@ -481,6 +492,9 @@ static void svc_tcp_handshake_done(void *data, int status, key_serial_t peerid,
* svc_tcp_handshake - Perform a transport-layer security handshake
* @xprt: connected transport endpoint
*
+ * If the transport already has a TLS session, the handshake request
+ * is declined: a fresh handshake would destroy the saved session
+ * tags.
*/
static void svc_tcp_handshake(struct svc_xprt *xprt)
{
@@ -493,8 +507,25 @@ static void svc_tcp_handshake(struct svc_xprt *xprt)
};
int ret;
+ /*
+ * The XPT_TLS_SESSION test in svcauth_tls_accept() is not
+ * race-free: a worker can pass it before a concurrent
+ * handshake completes and raise XPT_HANDSHAKE afterwards.
+ * XPT_BUSY serializes handshake starts, so this test cannot
+ * go stale: a set bit here means an established session
+ * whose tags other workers may be reading. Decline to start
+ * a handshake that would destroy them.
+ */
+ if (test_bit(XPT_TLS_SESSION, &xprt->xpt_flags)) {
+ clear_bit(XPT_HANDSHAKE, &xprt->xpt_flags);
+ set_bit(XPT_DATA, &xprt->xpt_flags);
+ svc_xprt_enqueue(xprt);
+ return;
+ }
+
trace_svc_tls_upcall(xprt);
+ tagset_destroy(&xprt->xpt_handshake_tags);
clear_bit(XPT_TLS_SESSION, &xprt->xpt_flags);
init_completion(&svsk->sk_handshake_done);
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 8/9] NFSD: Implement export tagging
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
` (6 preceding siblings ...)
2026-06-05 17:34 ` [PATCH 7/9] SUNRPC: Copy the TLS session tags when they are available Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-05 17:34 ` [PATCH 9/9] NFSD: Add allow_tags to the netlink export interface Chuck Lever
2026-06-06 13:26 ` [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Jeff Layton
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Today NFSD treats TLS client peer identity as a boolean: either a
peer is identified (authenticated) or it is not. Some deployments
need finer authorization than that. A single certificate may
authenticate several distinct actors, and an administrator may
wish to grant different levels of access to different peers
presenting the same certificate.
Once a TLS handshake completes, tlshd hands the kernel a list of
tags associated with the session. For exports with an allow_tags
list configured, NFSD tests the handshake tags against that list
and grants access only when the session carries at least one
matching tag. Exports with no allow_tags list continue to grant
access to any authenticated peer, preserving existing behavior.
Tags accompany only mTLS sessions, so allow_tags is meaningful
only when xprtsec resolves to mtls alone. svc_export_parse()
rejects an allow_tags list paired with any other xprtsec mode,
making the administrator state the combination explicitly rather
than allowing a default xprtsec setting to silently expose the
export to plaintext or anonymous-TLS peers.
Tags are parsed from exportfs during cache fill and freed when
the export cache entry is released. Tagset ownership transfers
to the cache entry on update so memory is managed correctly
across the cache lifecycle.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/export.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++--
fs/nfsd/export.h | 11 ++++++++
fs/nfsd/trace.h | 19 +++++++++++++
include/net/handshake.h | 4 +++
4 files changed, 105 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index a47c90f40422..a2aaa3cd6c52 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -18,6 +18,7 @@
#include <linux/exportfs.h>
#include <linux/sunrpc/svc_xprt.h>
#include <net/genetlink.h>
+#include <net/handshake.h>
#include <uapi/linux/nfsd_netlink.h>
#include "nfsd.h"
@@ -627,6 +628,7 @@ static void svc_export_release(struct rcu_head *rcu_head)
struct svc_export *exp = container_of(rcu_head, struct svc_export,
ex_rcu);
+ tagset_destroy(&exp->ex_allow_tags);
nfsd4_fslocs_free(&exp->ex_fslocs);
export_stats_destroy(exp->ex_stats);
kfree(exp->ex_stats);
@@ -1285,6 +1287,55 @@ static int xprtsec_parse(char **mesg, char *buf, struct svc_export *exp)
return 0;
}
+static int tags_parse(char **mesg, char *buf, struct tagset *tags)
+{
+ unsigned int i, listsize;
+ int err;
+
+ /* more than one allow_tags */
+ if (tags->ts_finalized)
+ return -EINVAL;
+
+ err = get_uint(mesg, &listsize);
+ if (err)
+ return -EINVAL;
+ if (listsize == 0 || listsize > NFSD_MAX_ALLOW_TAGS)
+ return -EINVAL;
+ if (!tagset_alloc(tags, listsize, GFP_KERNEL))
+ return -ENOMEM;
+
+ for (i = 0; i < listsize; i++) {
+ int len;
+
+ len = qword_get(mesg, buf, PAGE_SIZE);
+ if (len <= 0 || len > HANDSHAKE_SESSION_TAG_MAX_LEN)
+ return -EINVAL;
+ if (strlen(buf) != len)
+ return -EINVAL;
+ if (!tagset_add_dup(tags, buf, GFP_KERNEL))
+ return -ENOMEM;
+ }
+ tagset_finalize(tags);
+
+ return 0;
+}
+
+/*
+ * Session tags are issued only with an mTLS handshake, so an
+ * allow_tags list is meaningful only when xprtsec resolves to
+ * mtls alone. Reject combinations that would otherwise let
+ * plaintext or anonymous-TLS peers reach the export without
+ * ever consulting the tag list. Every producer of a svc_export
+ * must apply this check after it has resolved both fields.
+ */
+static int check_allow_tags(const struct svc_export *exp)
+{
+ if (!tagset_is_empty(&exp->ex_allow_tags) &&
+ exp->ex_xprtsec_modes != NFSEXP_XPRTSEC_MTLS)
+ return -EINVAL;
+ return 0;
+}
+
static inline int
nfsd_uuid_parse(char **mesg, char *buf, unsigned char **puuid)
{
@@ -1346,6 +1397,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
exp.cd = cd;
exp.ex_devid_map = NULL;
exp.ex_xprtsec_modes = NFSEXP_XPRTSEC_ALL;
+ tagset_init(&exp.ex_allow_tags);
/* expiry */
err = get_expiry(&mesg, &exp.h.expiry_time);
@@ -1389,6 +1441,8 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
err = secinfo_parse(&mesg, buf, &exp);
else if (strcmp(buf, "xprtsec") == 0)
err = xprtsec_parse(&mesg, buf, &exp);
+ else if (strcmp(buf, "allow_tags") == 0)
+ err = tags_parse(&mesg, buf, &exp.ex_allow_tags);
else
/* quietly ignore unknown words and anything
* following. Newer user-space can try to set
@@ -1399,6 +1453,10 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
goto out4;
}
+ err = check_allow_tags(&exp);
+ if (err)
+ goto out4;
+
err = check_export(&exp.ex_path, &exp.ex_flags, exp.ex_uuid);
if (err)
goto out4;
@@ -1441,6 +1499,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
} else
err = -ENOMEM;
out4:
+ tagset_destroy(&exp.ex_allow_tags);
nfsd4_fslocs_free(&exp.ex_fslocs);
kfree(exp.ex_uuid);
out3:
@@ -1568,6 +1627,8 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
new->ex_flavors[i] = item->ex_flavors[i];
}
new->ex_xprtsec_modes = item->ex_xprtsec_modes;
+ new->ex_allow_tags = item->ex_allow_tags;
+ tagset_init(&item->ex_allow_tags);
}
static struct cache_head *svc_export_alloc(void)
@@ -1588,6 +1649,8 @@ static struct cache_head *svc_export_alloc(void)
return NULL;
}
+ tagset_init(&i->ex_allow_tags);
+
return &i->h;
}
@@ -1815,8 +1878,14 @@ __be32 check_xprtsec_policy(struct svc_export *exp, struct svc_rqst *rqstp)
}
if (exp->ex_xprtsec_modes & NFSEXP_XPRTSEC_MTLS) {
if (test_bit(XPT_TLS_SESSION, &xprt->xpt_flags) &&
- test_bit(XPT_PEER_AUTH, &xprt->xpt_flags))
- return nfs_ok;
+ test_bit(XPT_PEER_AUTH, &xprt->xpt_flags)) {
+ if (tagset_is_empty(&exp->ex_allow_tags))
+ return nfs_ok;
+ if (tagset_intersection(&xprt->xpt_handshake_tags,
+ &exp->ex_allow_tags))
+ return nfs_ok;
+ trace_nfsd_export_tags_denied(exp);
+ }
}
return nfserr_wrongsec;
}
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index d2b09cd76145..c315cb4f0538 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -7,6 +7,7 @@
#include <linux/sunrpc/cache.h>
#include <linux/percpu_counter.h>
+#include <linux/tagset.h>
#include <uapi/linux/nfsd/export.h>
#include <linux/nfs4.h>
@@ -47,6 +48,15 @@ struct exp_flavor_info {
u32 flags;
};
+/*
+ * Cap on the number of tags in an export's allow_tags list. This
+ * is an export policy limit, independent of the per-handshake cap
+ * on session tags (HANDSHAKE_MAX_SESSIONTAGS). It bounds the cost
+ * of the tagset_intersection() that check_xprtsec_policy() runs
+ * per request against a tagged export.
+ */
+#define NFSD_MAX_ALLOW_TAGS 64
+
/* Per-export stats */
enum {
EXP_STATS_FH_STALE,
@@ -78,6 +88,7 @@ struct svc_export {
struct rcu_head ex_rcu;
unsigned long ex_xprtsec_modes;
struct export_stats *ex_stats;
+ struct tagset ex_allow_tags;
};
/* an "export key" (expkey) maps a filehandlefragement to an
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index d01496aa3cf8..a426da9efebf 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -467,6 +467,25 @@ TRACE_EVENT(nfsd_export_update,
)
);
+TRACE_EVENT(nfsd_export_tags_denied,
+ TP_PROTO(
+ const struct svc_export *exp
+ ),
+ TP_ARGS(exp),
+ TP_STRUCT__entry(
+ __string(path, exp->ex_path.dentry->d_name.name)
+ __string(auth_domain, exp->ex_client->name)
+ ),
+ TP_fast_assign(
+ __assign_str(path);
+ __assign_str(auth_domain);
+ ),
+ TP_printk("path=%s domain=%s",
+ __get_str(path),
+ __get_str(auth_domain)
+ )
+);
+
DECLARE_EVENT_CLASS(nfsd_io_class,
TP_PROTO(struct svc_rqst *rqstp,
struct svc_fh *fhp,
diff --git a/include/net/handshake.h b/include/net/handshake.h
index fa43b108c2a8..d7411dbf5253 100644
--- a/include/net/handshake.h
+++ b/include/net/handshake.h
@@ -11,10 +11,14 @@
#define _NET_HANDSHAKE_H
#include <linux/tagset.h>
+#include <uapi/linux/handshake.h>
/*
* Per-handshake cap on session tags. Bounds the cost of
* tagset_intersection() in consumer authorization checks.
+ * The per-tag byte limit is HANDSHAKE_SESSION_TAG_MAX_LEN,
+ * generated from Documentation/netlink/specs/handshake.yaml
+ * and enforced by the netlink policy at the kernel boundary.
*/
#define HANDSHAKE_MAX_SESSIONTAGS 64
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 9/9] NFSD: Add allow_tags to the netlink export interface
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
` (7 preceding siblings ...)
2026-06-05 17:34 ` [PATCH 8/9] NFSD: Implement export tagging Chuck Lever
@ 2026-06-05 17:34 ` Chuck Lever
2026-06-06 13:26 ` [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Jeff Layton
9 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-05 17:34 UTC (permalink / raw)
To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Andrew Morton, John Fastabend, Sabrina Dubroca, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The legacy exportfs cache path accepts an allow_tags clause that
restricts an export to mTLS sessions carrying at least one matching
session tag. The netlink-based svc_export interface had no such
attribute, so administrators configuring exports via netlink could
not request tag enforcement: nfsd_nl_parse_one_export() always
left ex_allow_tags empty, and check_xprtsec_policy() then granted
any authenticated peer.
Extend the svc-export attribute set with allow-tags and parse it
in nfsd_nl_parse_one_export(). Apply the same xprtsec=mtls
consistency check as svc_export_parse() so the netlink path
refuses contradictory security policy rather than silently exposing
a tagged export to plaintext or anonymous-TLS peers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Documentation/netlink/specs/nfsd.yaml | 10 ++++++
fs/nfsd/export.c | 68 +++++++++++++++++++++++++++++++++--
fs/nfsd/netlink.c | 4 ++-
fs/nfsd/netlink.h | 3 +-
include/uapi/linux/nfsd_netlink.h | 1 +
5 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
index 8f36fadd68f7..5cbdc1dab7e3 100644
--- a/Documentation/netlink/specs/nfsd.yaml
+++ b/Documentation/netlink/specs/nfsd.yaml
@@ -7,6 +7,10 @@ uapi-header: linux/nfsd_netlink.h
doc: NFSD configuration over generic netlink.
definitions:
+ -
+ name: handshake-session-tag-max-len
+ type: const
+ header: uapi/linux/handshake.h
-
type: flags
name: cache-type
@@ -253,6 +257,12 @@ attribute-sets:
-
name: fsid
type: s32
+ -
+ name: allow-tags
+ type: string
+ checks:
+ max-len: handshake-session-tag-max-len
+ multi-attr: true
-
name: svc-export-reqs
attributes:
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index a2aaa3cd6c52..25802de2de40 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -831,6 +831,7 @@ static struct svc_export *svc_export_update(struct svc_export *new,
static struct svc_export *svc_export_lookup(struct svc_export *);
static int check_export(const struct path *path, int *flags,
unsigned char *uuid);
+static int check_allow_tags(const struct svc_export *exp);
/**
* nfsd_nl_parse_one_export - parse one svc_export entry from a netlink message
@@ -845,14 +846,14 @@ static int check_export(const struct path *path, int *flags,
static int nfsd_nl_parse_one_export(struct cache_detail *cd,
struct nlattr *attr)
{
- struct nlattr *tb[NFSD_A_SVC_EXPORT_FSID + 1];
+ struct nlattr *tb[NFSD_A_SVC_EXPORT_ALLOW_TAGS + 1];
struct auth_domain *dom = NULL;
struct svc_export exp = {}, *expp;
struct nlattr *secinfo_attr;
struct timespec64 boot;
int err, rem;
- err = nla_parse_nested(tb, NFSD_A_SVC_EXPORT_FSID, attr,
+ err = nla_parse_nested(tb, NFSD_A_SVC_EXPORT_ALLOW_TAGS, attr,
nfsd_svc_export_nl_policy, NULL);
if (err)
return err;
@@ -993,6 +994,68 @@ static int nfsd_nl_parse_one_export(struct cache_detail *cd,
}
}
+ /* allow-tags (multi-attr string) */
+ if (tb[NFSD_A_SVC_EXPORT_ALLOW_TAGS]) {
+ struct nlattr *tag_attr;
+ unsigned int count = 0;
+
+ /*
+ * The NLA_STRING policy does not guarantee a
+ * terminating NUL, so each tag is copied with
+ * the length-aware nla_strdup(). Embedded NUL
+ * bytes are rejected here because the policy
+ * cannot express that check; a tag containing
+ * one could never match a handshake-supplied
+ * tag, which net/handshake rejects the same
+ * way.
+ */
+ nla_for_each_nested_type(tag_attr,
+ NFSD_A_SVC_EXPORT_ALLOW_TAGS,
+ attr, rem) {
+ const char *src = nla_data(tag_attr);
+ size_t srclen = nla_len(tag_attr);
+
+ if (srclen > 0 && src[srclen - 1] == '\0')
+ srclen--;
+ if (srclen == 0 ||
+ memchr(src, '\0', srclen)) {
+ err = -EINVAL;
+ goto out_uuid;
+ }
+ count++;
+ }
+ if (count > NFSD_MAX_ALLOW_TAGS) {
+ err = -EINVAL;
+ goto out_uuid;
+ }
+ if (!tagset_alloc(&exp.ex_allow_tags, count,
+ GFP_KERNEL)) {
+ err = -ENOMEM;
+ goto out_uuid;
+ }
+ nla_for_each_nested_type(tag_attr,
+ NFSD_A_SVC_EXPORT_ALLOW_TAGS,
+ attr, rem) {
+ char *tag;
+
+ tag = nla_strdup(tag_attr, GFP_KERNEL);
+ if (!tag) {
+ err = -ENOMEM;
+ goto out_uuid;
+ }
+ if (!tagset_add(&exp.ex_allow_tags, tag)) {
+ kfree(tag);
+ err = -ENOMEM;
+ goto out_uuid;
+ }
+ }
+ tagset_finalize(&exp.ex_allow_tags);
+ }
+
+ err = check_allow_tags(&exp);
+ if (err)
+ goto out_uuid;
+
err = check_export(&exp.ex_path, &exp.ex_flags,
exp.ex_uuid);
if (err)
@@ -1026,6 +1089,7 @@ static int nfsd_nl_parse_one_export(struct cache_detail *cd,
}
out_uuid:
+ tagset_destroy(&exp.ex_allow_tags);
kfree(exp.ex_uuid);
out_fslocs:
nfsd4_fslocs_free(&exp.ex_fslocs);
diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
index fbee3676d253..4db094b1021f 100644
--- a/fs/nfsd/netlink.c
+++ b/fs/nfsd/netlink.c
@@ -10,6 +10,7 @@
#include "netlink.h"
#include <uapi/linux/nfsd_netlink.h>
+#include <uapi/linux/handshake.h>
/* Common nested types */
const struct nla_policy nfsd_auth_flavor_nl_policy[NFSD_A_AUTH_FLAVOR_FLAGS + 1] = {
@@ -41,7 +42,7 @@ const struct nla_policy nfsd_sock_nl_policy[NFSD_A_SOCK_TRANSPORT_NAME + 1] = {
[NFSD_A_SOCK_TRANSPORT_NAME] = { .type = NLA_NUL_STRING, },
};
-const struct nla_policy nfsd_svc_export_nl_policy[NFSD_A_SVC_EXPORT_FSID + 1] = {
+const struct nla_policy nfsd_svc_export_nl_policy[NFSD_A_SVC_EXPORT_ALLOW_TAGS + 1] = {
[NFSD_A_SVC_EXPORT_SEQNO] = { .type = NLA_U64, },
[NFSD_A_SVC_EXPORT_CLIENT] = { .type = NLA_NUL_STRING, },
[NFSD_A_SVC_EXPORT_PATH] = { .type = NLA_NUL_STRING, },
@@ -55,6 +56,7 @@ const struct nla_policy nfsd_svc_export_nl_policy[NFSD_A_SVC_EXPORT_FSID + 1] =
[NFSD_A_SVC_EXPORT_XPRTSEC] = NLA_POLICY_MASK(NLA_U32, 0x7),
[NFSD_A_SVC_EXPORT_FLAGS] = NLA_POLICY_MASK(NLA_U32, 0x3ffff),
[NFSD_A_SVC_EXPORT_FSID] = { .type = NLA_S32, },
+ [NFSD_A_SVC_EXPORT_ALLOW_TAGS] = { .type = NLA_STRING, .len = HANDSHAKE_SESSION_TAG_MAX_LEN, },
};
const struct nla_policy nfsd_version_nl_policy[NFSD_A_VERSION_ENABLED + 1] = {
diff --git a/fs/nfsd/netlink.h b/fs/nfsd/netlink.h
index af41aa0d4a65..133e99a0a3fc 100644
--- a/fs/nfsd/netlink.h
+++ b/fs/nfsd/netlink.h
@@ -11,6 +11,7 @@
#include <net/genetlink.h>
#include <uapi/linux/nfsd_netlink.h>
+#include <uapi/linux/handshake.h>
/* Common nested types */
extern const struct nla_policy nfsd_auth_flavor_nl_policy[NFSD_A_AUTH_FLAVOR_FLAGS + 1];
@@ -18,7 +19,7 @@ extern const struct nla_policy nfsd_expkey_nl_policy[NFSD_A_EXPKEY_PATH + 1];
extern const struct nla_policy nfsd_fslocation_nl_policy[NFSD_A_FSLOCATION_PATH + 1];
extern const struct nla_policy nfsd_fslocations_nl_policy[NFSD_A_FSLOCATIONS_LOCATION + 1];
extern const struct nla_policy nfsd_sock_nl_policy[NFSD_A_SOCK_TRANSPORT_NAME + 1];
-extern const struct nla_policy nfsd_svc_export_nl_policy[NFSD_A_SVC_EXPORT_FSID + 1];
+extern const struct nla_policy nfsd_svc_export_nl_policy[NFSD_A_SVC_EXPORT_ALLOW_TAGS + 1];
extern const struct nla_policy nfsd_version_nl_policy[NFSD_A_VERSION_ENABLED + 1];
int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
diff --git a/include/uapi/linux/nfsd_netlink.h b/include/uapi/linux/nfsd_netlink.h
index f5b75d5caba9..23a42c26ede0 100644
--- a/include/uapi/linux/nfsd_netlink.h
+++ b/include/uapi/linux/nfsd_netlink.h
@@ -165,6 +165,7 @@ enum {
NFSD_A_SVC_EXPORT_XPRTSEC,
NFSD_A_SVC_EXPORT_FLAGS,
NFSD_A_SVC_EXPORT_FSID,
+ NFSD_A_SVC_EXPORT_ALLOW_TAGS,
__NFSD_A_SVC_EXPORT_MAX,
NFSD_A_SVC_EXPORT_MAX = (__NFSD_A_SVC_EXPORT_MAX - 1)
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD)
2026-06-05 17:34 [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Chuck Lever
` (8 preceding siblings ...)
2026-06-05 17:34 ` [PATCH 9/9] NFSD: Add allow_tags to the netlink export interface Chuck Lever
@ 2026-06-06 13:26 ` Jeff Layton
2026-06-06 14:43 ` Chuck Lever
9 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2026-06-06 13:26 UTC (permalink / raw)
To: Chuck Lever, Donald Hunter, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Shuah Khan, Andrew Morton, John Fastabend, Sabrina Dubroca,
Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
On Fri, 2026-06-05 at 13:34 -0400, Chuck Lever wrote:
> NFSD and similar upper-layer services want access-control decisions
> based on TLS peer-certificate characteristics, but in-kernel x.509
> parsing would duplicate work mature userspace libraries already do.
> This series gives tlshd a way to evaluate certificates against
> admin-defined policy and report matching policies back to the kernel
> as opaque string tags. The handshake layer plumbs the tags through to
> the upper-layer consumer's completion callback; intersection against
> per-resource tag sets stays the consumer's problem.
>
> Four architectural choices shape the series, only one of which is
> visible in any single patch.
>
> The tagging vocabulary is opaque to the kernel. tlshd decides what
> each tag means; the handshake layer and its consumers only test
> membership. This keeps x.509 out of the kernel and lets policy evolve
> at userspace speed. Any future attribute the kernel wants to gate on
> must be expressed as a tag rather than as a new netlink field per
> attribute.
>
> DONE gains a privilege check (patch 1) as a prerequisite, not as
> cleanup. Without it, an unprivileged process guessing a sockfd could
> submit a forged DONE and effectively grant or deny tag membership
> for a real handshake. Once tags carry authorization weight, that
> pre-existing gap becomes load-bearing. The fix predates tags in
> principle and carries a Fixes: tag, but it sits at the head of this
> series so the rest of the work has a trustworthy foundation.
>
> HANDSHAKE_MAX_SESSIONTAGS is advertised on every ACCEPT reply as
> HANDSHAKE_A_ACCEPT_MAX_TAGS (patch 6), so tlshd can size its
> DONE-side tag list against the kernel's runtime limit rather than
> guessing from header constants. If a daemon overruns anyway, the
> DONE handler truncates and logs one pr_warn_once rather than
> returning -E2BIG: tearing down a handshake the operator almost
> certainly wants to keep is a worse outcome than dropping a few
> tags. The truncation path is defense-in-depth for a buggy or
> stale agent, not the primary signal.
>
> The tagset helper (patch 3) is split out as a generic library so
> NFSD export tagging (patches 8 and 9) can use it without further
> churn in net/handshake/.
>
> ---
> Chuck Lever (9):
> handshake: Require admin permission for DONE command
> handshake: Add tags to "done" downcall
> lib: Add a "tagset" data structure
> handshake: Pick up session tags passed during the DONE downcall
> handshake: Add a kunit test for the completion gate
> handshake: advertise the session-tag cap to user space
> SUNRPC: Copy the TLS session tags when they are available
> NFSD: Implement export tagging
> NFSD: Add allow_tags to the netlink export interface
>
> Documentation/core-api/index.rst | 1 +
> Documentation/core-api/tagset.rst | 225 +++++++++++++++++++++++++++++
> Documentation/netlink/specs/handshake.yaml | 16 ++
> Documentation/netlink/specs/nfsd.yaml | 10 ++
> Documentation/networking/tls-handshake.rst | 63 +++++++-
> drivers/nvme/host/tcp.c | 3 +-
> drivers/nvme/target/tcp.c | 3 +-
> fs/nfsd/export.c | 141 +++++++++++++++++-
> fs/nfsd/export.h | 11 ++
> fs/nfsd/netlink.c | 4 +-
> fs/nfsd/netlink.h | 3 +-
> fs/nfsd/trace.h | 19 +++
> include/linux/sunrpc/svc_xprt.h | 2 +
> include/linux/tagset.h | 187 ++++++++++++++++++++++++
> include/net/handshake.h | 30 +++-
> include/uapi/linux/handshake.h | 4 +
> include/uapi/linux/nfsd_netlink.h | 1 +
> lib/Makefile | 1 +
> lib/tagset.c | 174 ++++++++++++++++++++++
> net/handshake/genl.c | 7 +-
> net/handshake/handshake-test.c | 72 +++++++++
> net/handshake/handshake.h | 6 +
> net/handshake/netlink.c | 109 +++++++++++++-
> net/handshake/request.c | 68 ++++++++-
> net/handshake/tlshd.c | 10 +-
> net/sunrpc/svc_xprt.c | 11 +-
> net/sunrpc/svcauth_unix.c | 12 ++
> net/sunrpc/svcsock.c | 38 ++++-
> net/sunrpc/xprtsock.c | 5 +-
> 29 files changed, 1205 insertions(+), 31 deletions(-)
> ---
> base-commit: 4d4d6605de5f91a40335729b6a7cc15e83b280f3
> change-id: 20260512-tls-session-tags-9d0042583f44
>
> Best regards,
> --
> Chuck Lever <chuck.lever@oracle.com>
I was wanting to review this, but I can't seem to get it to apply
cleanly to any known tree. What tree is this based on?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD)
2026-06-06 13:26 ` [PATCH 0/9] Deliver TLS session tags to upper-layer consumers (NFSD) Jeff Layton
@ 2026-06-06 14:43 ` Chuck Lever
0 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2026-06-06 14:43 UTC (permalink / raw)
To: Jeff Layton, Donald Hunter, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Shuah Khan, Andrew Morton, John Fastabend, Sabrina Dubroca,
Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: kernel-tls-handshake, netdev, linux-nvme, linux-nfs, Chuck Lever
On Sat, Jun 6, 2026, at 6:26 AM, Jeff Layton wrote:
> On Fri, 2026-06-05 at 13:34 -0400, Chuck Lever wrote:
>> NFSD and similar upper-layer services want access-control decisions
>> based on TLS peer-certificate characteristics, but in-kernel x.509
>> parsing would duplicate work mature userspace libraries already do.
>> This series gives tlshd a way to evaluate certificates against
>> admin-defined policy and report matching policies back to the kernel
>> as opaque string tags. The handshake layer plumbs the tags through to
>> the upper-layer consumer's completion callback; intersection against
>> per-resource tag sets stays the consumer's problem.
>>
>> Four architectural choices shape the series, only one of which is
>> visible in any single patch.
>>
>> The tagging vocabulary is opaque to the kernel. tlshd decides what
>> each tag means; the handshake layer and its consumers only test
>> membership. This keeps x.509 out of the kernel and lets policy evolve
>> at userspace speed. Any future attribute the kernel wants to gate on
>> must be expressed as a tag rather than as a new netlink field per
>> attribute.
>>
>> DONE gains a privilege check (patch 1) as a prerequisite, not as
>> cleanup. Without it, an unprivileged process guessing a sockfd could
>> submit a forged DONE and effectively grant or deny tag membership
>> for a real handshake. Once tags carry authorization weight, that
>> pre-existing gap becomes load-bearing. The fix predates tags in
>> principle and carries a Fixes: tag, but it sits at the head of this
>> series so the rest of the work has a trustworthy foundation.
>>
>> HANDSHAKE_MAX_SESSIONTAGS is advertised on every ACCEPT reply as
>> HANDSHAKE_A_ACCEPT_MAX_TAGS (patch 6), so tlshd can size its
>> DONE-side tag list against the kernel's runtime limit rather than
>> guessing from header constants. If a daemon overruns anyway, the
>> DONE handler truncates and logs one pr_warn_once rather than
>> returning -E2BIG: tearing down a handshake the operator almost
>> certainly wants to keep is a worse outcome than dropping a few
>> tags. The truncation path is defense-in-depth for a buggy or
>> stale agent, not the primary signal.
>>
>> The tagset helper (patch 3) is split out as a generic library so
>> NFSD export tagging (patches 8 and 9) can use it without further
>> churn in net/handshake/.
>>
>> ---
>> Chuck Lever (9):
>> handshake: Require admin permission for DONE command
>> handshake: Add tags to "done" downcall
>> lib: Add a "tagset" data structure
>> handshake: Pick up session tags passed during the DONE downcall
>> handshake: Add a kunit test for the completion gate
>> handshake: advertise the session-tag cap to user space
>> SUNRPC: Copy the TLS session tags when they are available
>> NFSD: Implement export tagging
>> NFSD: Add allow_tags to the netlink export interface
>>
>> Documentation/core-api/index.rst | 1 +
>> Documentation/core-api/tagset.rst | 225 +++++++++++++++++++++++++++++
>> Documentation/netlink/specs/handshake.yaml | 16 ++
>> Documentation/netlink/specs/nfsd.yaml | 10 ++
>> Documentation/networking/tls-handshake.rst | 63 +++++++-
>> drivers/nvme/host/tcp.c | 3 +-
>> drivers/nvme/target/tcp.c | 3 +-
>> fs/nfsd/export.c | 141 +++++++++++++++++-
>> fs/nfsd/export.h | 11 ++
>> fs/nfsd/netlink.c | 4 +-
>> fs/nfsd/netlink.h | 3 +-
>> fs/nfsd/trace.h | 19 +++
>> include/linux/sunrpc/svc_xprt.h | 2 +
>> include/linux/tagset.h | 187 ++++++++++++++++++++++++
>> include/net/handshake.h | 30 +++-
>> include/uapi/linux/handshake.h | 4 +
>> include/uapi/linux/nfsd_netlink.h | 1 +
>> lib/Makefile | 1 +
>> lib/tagset.c | 174 ++++++++++++++++++++++
>> net/handshake/genl.c | 7 +-
>> net/handshake/handshake-test.c | 72 +++++++++
>> net/handshake/handshake.h | 6 +
>> net/handshake/netlink.c | 109 +++++++++++++-
>> net/handshake/request.c | 68 ++++++++-
>> net/handshake/tlshd.c | 10 +-
>> net/sunrpc/svc_xprt.c | 11 +-
>> net/sunrpc/svcauth_unix.c | 12 ++
>> net/sunrpc/svcsock.c | 38 ++++-
>> net/sunrpc/xprtsock.c | 5 +-
>> 29 files changed, 1205 insertions(+), 31 deletions(-)
>> ---
>> base-commit: 4d4d6605de5f91a40335729b6a7cc15e83b280f3
>> change-id: 20260512-tls-session-tags-9d0042583f44
>>
>> Best regards,
>> --
>> Chuck Lever <chuck.lever@oracle.com>
>
> I was wanting to review this, but I can't seem to get it to apply
> cleanly to any known tree. What tree is this based on?
commit 4d4d6605de5f91a40335729b6a7cc15e83b280f3 (cel/nfsd-testing)
Author: Chuck Lever <chuck.lever@oracle.com>
AuthorDate: Thu Sep 5 15:25:37 2024 -0400
Commit: Chuck Lever <chuck.lever@oracle.com>
CommitDate: Thu May 28 11:34:51 2026 -0400
That's some old shit.
I will rebase it before posting it again.
--
Chuck Lever
^ permalink raw reply [flat|nested] 13+ messages in thread