* [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag
@ 2026-06-12 17:59 Xin Long
2026-06-13 7:10 ` Willy Tarreau
2026-06-15 11:04 ` Simon Horman
0 siblings, 2 replies; 3+ messages in thread
From: Xin Long @ 2026-06-12 17:59 UTC (permalink / raw)
To: network dev, linux-sctp
Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Simon Horman,
Marcelo Ricardo Leitner, Willy Tarreau, Zero Day Initiative
SCTP_DIAG endpoint dumping currently walks the endpoint hash table
without taking the socket lock before calling inet_sctp_diag_fill().
This is problematic because inet_sctp_diag_fill() eventually calls
inet_diag_msg_sctpladdrs_fill(), which traverses the endpoint's local
address list twice: once to count entries for nla_reserve(), and once
again to copy the addresses into the netlink buffer.
Since these two traversals are protected only by separate RCU read-side
critical sections, concurrent socket operations such as
SCTP_SOCKOPT_BINDX_REM may remove entries from the address list between
them. In that case, the number of copied addresses becomes smaller than
the originally reserved buffer size, leaving part of the netlink payload
uninitialized and potentially leaking kernel memory to user space.
Fix this by changing sctp_for_each_endpoint() to iterate with net and
position awareness while taking a reference on each socket, then release
the endpoint hash bucket read_lock_bh() before invoking the callback.
A socket reference is required because the callback acquires lock_sock(),
which must be called outside of read_lock_bh() since lock_sock() may
sleep. Holding a socket reference ensures the socket remains valid after
dropping the bucket lock and before acquiring the socket lock.
With the socket lock held, concurrent bind-address modifications are
serialized against the diagnostic dump, ensuring the local address list
remains stable during buffer sizing and initialization.
This also simplifies endpoint traversal by removing the temporary
callback local position tracking args[4] and moving dump progress
tracking into sctp_for_each_endpoint() itself.
While at it, fix the idiag_states check in sctp_ep_dump() and skip ep
dumping when non LISTEN|CLOSE states are also requested and the ep has
assocs, since such cases will be handled later by sctp_sock_dump().
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
include/net/sctp/sctp.h | 3 +-
net/sctp/diag.c | 62 +++++++++++++++++++----------------------
net/sctp/socket.c | 34 +++++++++++++++++-----
3 files changed, 57 insertions(+), 42 deletions(-)
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 58242b37b47a..cd82b05354a3 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -111,7 +111,8 @@ int sctp_transport_lookup_process(sctp_callback_t cb, struct net *net,
const union sctp_addr *paddr, void *p, int dif);
int sctp_transport_traverse_process(sctp_callback_t cb, sctp_callback_t cb_done,
struct net *net, int *pos, void *p);
-int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p);
+int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
+ struct net *net, int *pos, void *p);
int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
struct sctp_info *info);
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index d758f5c3e06e..9108272ca527 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -92,6 +92,7 @@ static int inet_diag_msg_sctpladdrs_fill(struct sk_buff *skb,
if (!--addrcnt)
break;
}
+ WARN_ON_ONCE(addrcnt);
rcu_read_unlock();
return 0;
@@ -373,42 +374,36 @@ static int sctp_ep_dump(struct sctp_endpoint *ep, void *p)
struct sk_buff *skb = commp->skb;
struct netlink_callback *cb = commp->cb;
const struct inet_diag_req_v2 *r = commp->r;
- struct net *net = sock_net(skb->sk);
struct inet_sock *inet = inet_sk(sk);
int err = 0;
- if (!net_eq(sock_net(sk), net))
+ lock_sock(sk);
+ if (sctp_sstate(sk, CLOSED))
goto out;
- if (cb->args[4] < cb->args[1])
- goto next;
-
- if (!(r->idiag_states & TCPF_LISTEN) && !list_empty(&ep->asocs))
- goto next;
+ if ((r->idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)) &&
+ !list_empty(&ep->asocs))
+ goto out;
if (r->sdiag_family != AF_UNSPEC &&
sk->sk_family != r->sdiag_family)
- goto next;
+ goto out;
if (r->id.idiag_sport != inet->inet_sport &&
r->id.idiag_sport)
- goto next;
+ goto out;
if (r->id.idiag_dport != inet->inet_dport &&
r->id.idiag_dport)
- goto next;
-
- if (inet_sctp_diag_fill(sk, NULL, skb, r,
- sk_user_ns(NETLINK_CB(cb->skb).sk),
- NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq, NLM_F_MULTI,
- cb->nlh, commp->net_admin) < 0) {
- err = 2;
goto out;
- }
-next:
- cb->args[4]++;
+
+ err = inet_sctp_diag_fill(sk, NULL, skb, r,
+ sk_user_ns(NETLINK_CB(cb->skb).sk),
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, NLM_F_MULTI,
+ cb->nlh, commp->net_admin);
out:
+ release_sock(sk);
return err;
}
@@ -479,41 +474,40 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
.r = r,
.net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN),
};
- int pos = cb->args[2];
+ int pos;
/* eps hashtable dumps
* args:
* 0 : if it will traversal listen sock
* 1 : to record the sock pos of this time's traversal
- * 4 : to work as a temporary variable to traversal list
*/
if (cb->args[0] == 0) {
- if (!(idiag_states & TCPF_LISTEN))
- goto skip;
- if (sctp_for_each_endpoint(sctp_ep_dump, &commp))
- goto done;
-skip:
+ if (idiag_states & TCPF_LISTEN) {
+ pos = cb->args[1];
+ if (sctp_for_each_endpoint(sctp_ep_dump, net, &pos,
+ &commp)) {
+ cb->args[1] = pos;
+ return;
+ }
+ }
cb->args[0] = 1;
cb->args[1] = 0;
- cb->args[4] = 0;
}
+ if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
+ return;
+
/* asocs by transport hashtable dump
* args:
* 1 : to record the assoc pos of this time's traversal
* 2 : to record the transport pos of this time's traversal
* 3 : to mark if we have dumped the ep info of the current asoc
* 4 : to work as a temporary variable to traversal list
- * 5 : to save the sk we get from travelsing the tsp list.
*/
- if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
- goto done;
-
+ pos = cb->args[2];
sctp_transport_traverse_process(sctp_sock_filter, sctp_sock_dump,
net, &pos, &commp);
cb->args[2] = pos;
-
-done:
cb->args[1] = cb->args[4];
cb->args[4] = 0;
}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 66e12fb0c646..1ed405dedc01 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5369,24 +5369,44 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net,
}
int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
- void *p) {
- int err = 0;
- int hash = 0;
- struct sctp_endpoint *ep;
+ struct net *net, int *pos, void *p) {
+ int err, hash = 0, idx = 0, start;
struct sctp_hashbucket *head;
+ struct sctp_endpoint *ep;
+ struct sock *sk;
for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
hash++, head++) {
+ start = idx;
+again:
+ sk = NULL;
read_lock_bh(&head->lock);
sctp_for_each_hentry(ep, &head->chain) {
- err = cb(ep, p);
- if (err)
+ if (sock_net(ep->base.sk) != net)
+ continue;
+ if (idx++ >= *pos) {
+ sk = ep->base.sk;
+ sock_hold(sk);
break;
+ }
}
read_unlock_bh(&head->lock);
+
+ if (sk) {
+ err = cb(ep, p);
+ if (err) {
+ sock_put(sk);
+ return err;
+ }
+ sock_put(sk);
+ (*pos)++;
+
+ idx = start;
+ goto again;
+ }
}
- return err;
+ return 0;
}
EXPORT_SYMBOL_GPL(sctp_for_each_endpoint);
--
2.47.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag
2026-06-12 17:59 [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag Xin Long
@ 2026-06-13 7:10 ` Willy Tarreau
2026-06-15 11:04 ` Simon Horman
1 sibling, 0 replies; 3+ messages in thread
From: Willy Tarreau @ 2026-06-13 7:10 UTC (permalink / raw)
To: Xin Long
Cc: network dev, linux-sctp, davem, kuba, Eric Dumazet, Paolo Abeni,
Simon Horman, Marcelo Ricardo Leitner, Zero Day Initiative
Hi,
On Fri, Jun 12, 2026 at 01:59:38PM -0400, Xin Long wrote:
> SCTP_DIAG endpoint dumping currently walks the endpoint hash table
> without taking the socket lock before calling inet_sctp_diag_fill().
>
> This is problematic because inet_sctp_diag_fill() eventually calls
> inet_diag_msg_sctpladdrs_fill(), which traverses the endpoint's local
> address list twice: once to count entries for nla_reserve(), and once
> again to copy the addresses into the netlink buffer.
>
> Since these two traversals are protected only by separate RCU read-side
> critical sections, concurrent socket operations such as
> SCTP_SOCKOPT_BINDX_REM may remove entries from the address list between
> them. In that case, the number of copied addresses becomes smaller than
> the originally reserved buffer size, leaving part of the netlink payload
> uninitialized and potentially leaking kernel memory to user space.
>
> Fix this by changing sctp_for_each_endpoint() to iterate with net and
> position awareness while taking a reference on each socket, then release
> the endpoint hash bucket read_lock_bh() before invoking the callback.
>
> A socket reference is required because the callback acquires lock_sock(),
> which must be called outside of read_lock_bh() since lock_sock() may
> sleep. Holding a socket reference ensures the socket remains valid after
> dropping the bucket lock and before acquiring the socket lock.
>
> With the socket lock held, concurrent bind-address modifications are
> serialized against the diagnostic dump, ensuring the local address list
> remains stable during buffer sizing and initialization.
>
> This also simplifies endpoint traversal by removing the temporary
> callback local position tracking args[4] and moving dump progress
> tracking into sctp_for_each_endpoint() itself.
>
> While at it, fix the idiag_states check in sctp_ep_dump() and skip ep
> dumping when non LISTEN|CLOSE states are also requested and the ep has
> assocs, since such cases will be handled later by sctp_sock_dump().
>
> Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Please note that the original report suggested this reporter:
Nico Yip (@_cyeaa_) working with TrendAI Zero Day Initiative
> Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
> include/net/sctp/sctp.h | 3 +-
> net/sctp/diag.c | 62 +++++++++++++++++++----------------------
> net/sctp/socket.c | 34 +++++++++++++++++-----
> 3 files changed, 57 insertions(+), 42 deletions(-)
>
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index 58242b37b47a..cd82b05354a3 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -111,7 +111,8 @@ int sctp_transport_lookup_process(sctp_callback_t cb, struct net *net,
> const union sctp_addr *paddr, void *p, int dif);
> int sctp_transport_traverse_process(sctp_callback_t cb, sctp_callback_t cb_done,
> struct net *net, int *pos, void *p);
> -int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p);
> +int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> + struct net *net, int *pos, void *p);
> int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
> struct sctp_info *info);
>
> diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> index d758f5c3e06e..9108272ca527 100644
> --- a/net/sctp/diag.c
> +++ b/net/sctp/diag.c
> @@ -92,6 +92,7 @@ static int inet_diag_msg_sctpladdrs_fill(struct sk_buff *skb,
> if (!--addrcnt)
> break;
> }
> + WARN_ON_ONCE(addrcnt);
> rcu_read_unlock();
>
> return 0;
> @@ -373,42 +374,36 @@ static int sctp_ep_dump(struct sctp_endpoint *ep, void *p)
> struct sk_buff *skb = commp->skb;
> struct netlink_callback *cb = commp->cb;
> const struct inet_diag_req_v2 *r = commp->r;
> - struct net *net = sock_net(skb->sk);
> struct inet_sock *inet = inet_sk(sk);
> int err = 0;
>
> - if (!net_eq(sock_net(sk), net))
> + lock_sock(sk);
> + if (sctp_sstate(sk, CLOSED))
> goto out;
>
> - if (cb->args[4] < cb->args[1])
> - goto next;
> -
> - if (!(r->idiag_states & TCPF_LISTEN) && !list_empty(&ep->asocs))
> - goto next;
> + if ((r->idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)) &&
> + !list_empty(&ep->asocs))
> + goto out;
>
> if (r->sdiag_family != AF_UNSPEC &&
> sk->sk_family != r->sdiag_family)
> - goto next;
> + goto out;
>
> if (r->id.idiag_sport != inet->inet_sport &&
> r->id.idiag_sport)
> - goto next;
> + goto out;
>
> if (r->id.idiag_dport != inet->inet_dport &&
> r->id.idiag_dport)
> - goto next;
> -
> - if (inet_sctp_diag_fill(sk, NULL, skb, r,
> - sk_user_ns(NETLINK_CB(cb->skb).sk),
> - NETLINK_CB(cb->skb).portid,
> - cb->nlh->nlmsg_seq, NLM_F_MULTI,
> - cb->nlh, commp->net_admin) < 0) {
> - err = 2;
> goto out;
> - }
> -next:
> - cb->args[4]++;
> +
> + err = inet_sctp_diag_fill(sk, NULL, skb, r,
> + sk_user_ns(NETLINK_CB(cb->skb).sk),
> + NETLINK_CB(cb->skb).portid,
> + cb->nlh->nlmsg_seq, NLM_F_MULTI,
> + cb->nlh, commp->net_admin);
> out:
> + release_sock(sk);
> return err;
> }
>
> @@ -479,41 +474,40 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
> .r = r,
> .net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN),
> };
> - int pos = cb->args[2];
> + int pos;
>
> /* eps hashtable dumps
> * args:
> * 0 : if it will traversal listen sock
> * 1 : to record the sock pos of this time's traversal
> - * 4 : to work as a temporary variable to traversal list
> */
> if (cb->args[0] == 0) {
> - if (!(idiag_states & TCPF_LISTEN))
> - goto skip;
> - if (sctp_for_each_endpoint(sctp_ep_dump, &commp))
> - goto done;
> -skip:
> + if (idiag_states & TCPF_LISTEN) {
> + pos = cb->args[1];
> + if (sctp_for_each_endpoint(sctp_ep_dump, net, &pos,
> + &commp)) {
> + cb->args[1] = pos;
> + return;
> + }
> + }
> cb->args[0] = 1;
> cb->args[1] = 0;
> - cb->args[4] = 0;
> }
>
> + if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
> + return;
> +
> /* asocs by transport hashtable dump
> * args:
> * 1 : to record the assoc pos of this time's traversal
> * 2 : to record the transport pos of this time's traversal
> * 3 : to mark if we have dumped the ep info of the current asoc
> * 4 : to work as a temporary variable to traversal list
> - * 5 : to save the sk we get from travelsing the tsp list.
> */
> - if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
> - goto done;
> -
> + pos = cb->args[2];
> sctp_transport_traverse_process(sctp_sock_filter, sctp_sock_dump,
> net, &pos, &commp);
> cb->args[2] = pos;
> -
> -done:
> cb->args[1] = cb->args[4];
> cb->args[4] = 0;
> }
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 66e12fb0c646..1ed405dedc01 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -5369,24 +5369,44 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net,
> }
>
> int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> - void *p) {
> - int err = 0;
> - int hash = 0;
> - struct sctp_endpoint *ep;
> + struct net *net, int *pos, void *p) {
> + int err, hash = 0, idx = 0, start;
> struct sctp_hashbucket *head;
> + struct sctp_endpoint *ep;
> + struct sock *sk;
>
> for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
> hash++, head++) {
> + start = idx;
> +again:
> + sk = NULL;
> read_lock_bh(&head->lock);
> sctp_for_each_hentry(ep, &head->chain) {
> - err = cb(ep, p);
> - if (err)
> + if (sock_net(ep->base.sk) != net)
> + continue;
> + if (idx++ >= *pos) {
> + sk = ep->base.sk;
> + sock_hold(sk);
> break;
> + }
> }
> read_unlock_bh(&head->lock);
> +
> + if (sk) {
> + err = cb(ep, p);
> + if (err) {
> + sock_put(sk);
> + return err;
> + }
> + sock_put(sk);
> + (*pos)++;
> +
> + idx = start;
> + goto again;
> + }
> }
>
> - return err;
> + return 0;
> }
> EXPORT_SYMBOL_GPL(sctp_for_each_endpoint);
>
> --
> 2.47.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag
2026-06-12 17:59 [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag Xin Long
2026-06-13 7:10 ` Willy Tarreau
@ 2026-06-15 11:04 ` Simon Horman
1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2026-06-15 11:04 UTC (permalink / raw)
To: lucien.xin
Cc: Simon Horman, netdev, linux-sctp, davem, kuba, edumazet, pabeni,
marcelo.leitner, w, zdi-disclosures
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
sctp: hold socket lock when dumping endpoints in sctp_diag
This change reworks SCTP_DIAG endpoint dumping so the socket lock is held
across inet_sctp_diag_fill() to keep the bind address list stable, and
moves position tracking into sctp_for_each_endpoint() which now drops the
bucket read_lock_bh() before invoking the callback.
A few questions on the new iterator and dump callback below.
> diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> index d758f5c3e06e..9108272ca527 100644
> --- a/net/sctp/diag.c
> +++ b/net/sctp/diag.c
> @@ -373,42 +374,36 @@ static int sctp_ep_dump(struct sctp_endpoint *ep, void *p)
> struct sk_buff *skb = commp->skb;
> struct netlink_callback *cb = commp->cb;
> const struct inet_diag_req_v2 *r = commp->r;
> - struct net *net = sock_net(skb->sk);
> struct inet_sock *inet = inet_sk(sk);
> int err = 0;
>
> - if (!net_eq(sock_net(sk), net))
> + lock_sock(sk);
> + if (sctp_sstate(sk, CLOSED))
> goto out;
[Low]
The new unconditional skip on sctp_sstate(sk, CLOSED) does not appear to
be described in the changelog.
SCTP_SS_CLOSED maps to TCP_CLOSE, so a userspace caller that asks for
TCPF_CLOSE in idiag_states will silently receive nothing from the
endpoint hash for entries in that state.
Most hashed endpoints are LISTENING (sctp_hash_endpoint() is called from
sctp_listen_start() right after inet_sk_set_state(sk, SCTP_SS_LISTENING))
so the practical effect is small, but is the intent to filter teardown
windows (in which case ep->base.dead might be a more direct check) or to
honour the user-requested idiag_states bitmask? Either way, would it be
worth calling this behaviour change out in the commit message?
>
> - if (cb->args[4] < cb->args[1])
> - goto next;
> -
> - if (!(r->idiag_states & TCPF_LISTEN) && !list_empty(&ep->asocs))
> - goto next;
> + if ((r->idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)) &&
> + !list_empty(&ep->asocs))
> + goto out;
[ ... ]
> @@ -479,41 +474,40 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
> .r = r,
> .net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN),
> };
> - int pos = cb->args[2];
> + int pos;
>
> /* eps hashtable dumps
> * args:
> * 0 : if it will traversal listen sock
> * 1 : to record the sock pos of this time's traversal
> - * 4 : to work as a temporary variable to traversal list
> */
> if (cb->args[0] == 0) {
> - if (!(idiag_states & TCPF_LISTEN))
> - goto skip;
> - if (sctp_for_each_endpoint(sctp_ep_dump, &commp))
> - goto done;
> -skip:
> + if (idiag_states & TCPF_LISTEN) {
> + pos = cb->args[1];
> + if (sctp_for_each_endpoint(sctp_ep_dump, net, &pos,
> + &commp)) {
> + cb->args[1] = pos;
> + return;
> + }
> + }
> cb->args[0] = 1;
> cb->args[1] = 0;
> - cb->args[4] = 0;
> }
>
> + if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
> + return;
> +
> /* asocs by transport hashtable dump
> * args:
> * 1 : to record the assoc pos of this time's traversal
> * 2 : to record the transport pos of this time's traversal
> * 3 : to mark if we have dumped the ep info of the current asoc
> * 4 : to work as a temporary variable to traversal list
> - * 5 : to save the sk we get from travelsing the tsp list.
> */
> - if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
> - goto done;
> -
> + pos = cb->args[2];
> sctp_transport_traverse_process(sctp_sock_filter, sctp_sock_dump,
> net, &pos, &commp);
> cb->args[2] = pos;
> -
> -done:
> cb->args[1] = cb->args[4];
> cb->args[4] = 0;
> }
[Low]
The commit message states that args[4] tracking is removed, but the
asocs-phase comment block still lists:
* 4 : to work as a temporary variable to traversal list
and the trailing epilogue still does:
cb->args[1] = cb->args[4];
cb->args[4] = 0;
Are those leftovers from the eps-phase rewrite, or are they still needed
by the asocs phase? If args[4] is genuinely gone, should the comment
and the epilogue be cleaned up too?
The previous comment also listed:
* 5 : to save the sk we get from travelsing the tsp list.
which is dropped here without further mention. Is args[5] confirmed
unused on the asocs path?
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 66e12fb0c646..1ed405dedc01 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -5369,24 +5369,44 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net,
> }
>
> int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> - void *p) {
> - int err = 0;
> - int hash = 0;
> - struct sctp_endpoint *ep;
> + struct net *net, int *pos, void *p) {
> + int err, hash = 0, idx = 0, start;
> struct sctp_hashbucket *head;
> + struct sctp_endpoint *ep;
> + struct sock *sk;
>
> for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
> hash++, head++) {
> + start = idx;
> +again:
> + sk = NULL;
> read_lock_bh(&head->lock);
> sctp_for_each_hentry(ep, &head->chain) {
> - err = cb(ep, p);
> - if (err)
> + if (sock_net(ep->base.sk) != net)
> + continue;
> + if (idx++ >= *pos) {
> + sk = ep->base.sk;
> + sock_hold(sk);
> break;
> + }
> }
> read_unlock_bh(&head->lock);
> +
> + if (sk) {
> + err = cb(ep, p);
[Critical]
Can ep be freed before this cb(ep, p) call?
The bucket read_lock_bh() is dropped above, and only sock_hold(sk) is
taken. In sctp_endpoint_init() the reference direction is
ep -> sk:
refcount_set(&ep->base.refcnt, 1);
...
sock_hold(sk);
so taking sock_hold(sk) does not pin ep.
A concurrent close(2) on the same socket runs:
sctp_close()
-> release_sock() /* lock_sock dropped here */
-> sk_common_release()
-> sctp_destroy_sock()
-> sctp_endpoint_free()
-> sctp_unhash_endpoint()
-> sctp_endpoint_put() /* refcount may reach 0 */
-> call_rcu(&ep->rcu, sctp_endpoint_destroy_rcu)
The destroy callback then does sctp_sk(sk)->ep = NULL; sock_put(sk);
kfree(ep).
Between read_unlock_bh() and cb(ep, p), this iterator is no longer in
any RCU read-side section, and sctp_ep_dump() then calls lock_sock(sk)
which may sleep. That allows an RCU grace period to elapse before the
callback dereferences ep at:
struct sock *sk = ep->base.sk;
...
if ((r->idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)) &&
!list_empty(&ep->asocs))
The sibling helpers in the same file pin the endpoint explicitly to
avoid this:
sctp_transport_traverse_process()
if (sctp_endpoint_hold(ep)) { /* asoc can be peeled off */
ret = cb(ep, tsp, p);
...
sctp_endpoint_put(ep);
}
Should sctp_for_each_endpoint() take sctp_endpoint_hold(ep) inside the
bucket read_lock_bh() (while ep is still hashed and live) and a matching
sctp_endpoint_put(ep) after cb returns?
[Medium]
Can this restart pattern miss or duplicate endpoints when the bucket is
modified concurrently?
After processing one ep the loop does:
idx = start;
goto again;
which re-walks the bucket from the head and uses the running counter
idx compared against *pos to skip already-emitted entries.
__sctp_hash_endpoint() inserts at the hlist head and
__sctp_unhash_endpoint() can delete arbitrary nodes, both under the
bucket write_lock that this iterator releases between walks. A
concurrent listen() on a sibling endpoint shifts every existing entry's
index by +1, and a concurrent close() before the cursor shifts indices
by -1.
Under that pattern, can SCTP_DIAG consumers observe duplicated
endpoints (the previously-emitted entry now sits at the new *pos) or
miss live endpoints (the next un-emitted entry is shifted past the
cursor)? The pre-patch sweep walked each bucket once with read_lock_bh
held throughout, so this consistency window did not exist.
[Low]
For a bucket that contains N matching endpoints, does this restart
pattern walk the bucket O(N^2) times?
For a bucket {A,B,C} starting from *pos=0 the goto-again loop performs
1+2+3 traversals to dump three entries plus a final 3-entry walk that
finds nothing and terminates with sk == NULL, which is nine list
iterations to dump three endpoints. The pre-patch code walked each
bucket exactly once because the callback ran while still holding
read_lock_bh().
The commit message describes the change as simplifying endpoint
traversal; could the description also note the new per-bucket scan
cost, or could a cursor that survives unlock/relock be used instead?
> + if (err) {
> + sock_put(sk);
> + return err;
> + }
> + sock_put(sk);
> + (*pos)++;
> +
> + idx = start;
> + goto again;
> + }
> }
>
> - return err;
> + return 0;
> }
> EXPORT_SYMBOL_GPL(sctp_for_each_endpoint);
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-15 11:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 17:59 [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag Xin Long
2026-06-13 7:10 ` Willy Tarreau
2026-06-15 11:04 ` Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox