From: Willy Tarreau <w@1wt.eu>
To: Xin Long <lucien.xin@gmail.com>
Cc: network dev <netdev@vger.kernel.org>,
linux-sctp@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
Zero Day Initiative <zdi-disclosures@trendmicro.com>
Subject: Re: [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag
Date: Sat, 13 Jun 2026 09:10:41 +0200 [thread overview]
Message-ID: <ai0CcdDCkhBO_RF2@1wt.eu> (raw)
In-Reply-To: <1bbd999cf88fb1ea93f4a3743047bce9b51adc24.1781287178.git.lucien.xin@gmail.com>
Hi,
On Fri, Jun 12, 2026 at 01:59:38PM -0400, Xin Long wrote:
> SCTP_DIAG endpoint dumping currently walks the endpoint hash table
> without taking the socket lock before calling inet_sctp_diag_fill().
>
> This is problematic because inet_sctp_diag_fill() eventually calls
> inet_diag_msg_sctpladdrs_fill(), which traverses the endpoint's local
> address list twice: once to count entries for nla_reserve(), and once
> again to copy the addresses into the netlink buffer.
>
> Since these two traversals are protected only by separate RCU read-side
> critical sections, concurrent socket operations such as
> SCTP_SOCKOPT_BINDX_REM may remove entries from the address list between
> them. In that case, the number of copied addresses becomes smaller than
> the originally reserved buffer size, leaving part of the netlink payload
> uninitialized and potentially leaking kernel memory to user space.
>
> Fix this by changing sctp_for_each_endpoint() to iterate with net and
> position awareness while taking a reference on each socket, then release
> the endpoint hash bucket read_lock_bh() before invoking the callback.
>
> A socket reference is required because the callback acquires lock_sock(),
> which must be called outside of read_lock_bh() since lock_sock() may
> sleep. Holding a socket reference ensures the socket remains valid after
> dropping the bucket lock and before acquiring the socket lock.
>
> With the socket lock held, concurrent bind-address modifications are
> serialized against the diagnostic dump, ensuring the local address list
> remains stable during buffer sizing and initialization.
>
> This also simplifies endpoint traversal by removing the temporary
> callback local position tracking args[4] and moving dump progress
> tracking into sctp_for_each_endpoint() itself.
>
> While at it, fix the idiag_states check in sctp_ep_dump() and skip ep
> dumping when non LISTEN|CLOSE states are also requested and the ep has
> assocs, since such cases will be handled later by sctp_sock_dump().
>
> Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Please note that the original report suggested this reporter:
Nico Yip (@_cyeaa_) working with TrendAI Zero Day Initiative
> Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
> include/net/sctp/sctp.h | 3 +-
> net/sctp/diag.c | 62 +++++++++++++++++++----------------------
> net/sctp/socket.c | 34 +++++++++++++++++-----
> 3 files changed, 57 insertions(+), 42 deletions(-)
>
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index 58242b37b47a..cd82b05354a3 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -111,7 +111,8 @@ int sctp_transport_lookup_process(sctp_callback_t cb, struct net *net,
> const union sctp_addr *paddr, void *p, int dif);
> int sctp_transport_traverse_process(sctp_callback_t cb, sctp_callback_t cb_done,
> struct net *net, int *pos, void *p);
> -int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p);
> +int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> + struct net *net, int *pos, void *p);
> int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
> struct sctp_info *info);
>
> diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> index d758f5c3e06e..9108272ca527 100644
> --- a/net/sctp/diag.c
> +++ b/net/sctp/diag.c
> @@ -92,6 +92,7 @@ static int inet_diag_msg_sctpladdrs_fill(struct sk_buff *skb,
> if (!--addrcnt)
> break;
> }
> + WARN_ON_ONCE(addrcnt);
> rcu_read_unlock();
>
> return 0;
> @@ -373,42 +374,36 @@ static int sctp_ep_dump(struct sctp_endpoint *ep, void *p)
> struct sk_buff *skb = commp->skb;
> struct netlink_callback *cb = commp->cb;
> const struct inet_diag_req_v2 *r = commp->r;
> - struct net *net = sock_net(skb->sk);
> struct inet_sock *inet = inet_sk(sk);
> int err = 0;
>
> - if (!net_eq(sock_net(sk), net))
> + lock_sock(sk);
> + if (sctp_sstate(sk, CLOSED))
> goto out;
>
> - if (cb->args[4] < cb->args[1])
> - goto next;
> -
> - if (!(r->idiag_states & TCPF_LISTEN) && !list_empty(&ep->asocs))
> - goto next;
> + if ((r->idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)) &&
> + !list_empty(&ep->asocs))
> + goto out;
>
> if (r->sdiag_family != AF_UNSPEC &&
> sk->sk_family != r->sdiag_family)
> - goto next;
> + goto out;
>
> if (r->id.idiag_sport != inet->inet_sport &&
> r->id.idiag_sport)
> - goto next;
> + goto out;
>
> if (r->id.idiag_dport != inet->inet_dport &&
> r->id.idiag_dport)
> - goto next;
> -
> - if (inet_sctp_diag_fill(sk, NULL, skb, r,
> - sk_user_ns(NETLINK_CB(cb->skb).sk),
> - NETLINK_CB(cb->skb).portid,
> - cb->nlh->nlmsg_seq, NLM_F_MULTI,
> - cb->nlh, commp->net_admin) < 0) {
> - err = 2;
> goto out;
> - }
> -next:
> - cb->args[4]++;
> +
> + err = inet_sctp_diag_fill(sk, NULL, skb, r,
> + sk_user_ns(NETLINK_CB(cb->skb).sk),
> + NETLINK_CB(cb->skb).portid,
> + cb->nlh->nlmsg_seq, NLM_F_MULTI,
> + cb->nlh, commp->net_admin);
> out:
> + release_sock(sk);
> return err;
> }
>
> @@ -479,41 +474,40 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
> .r = r,
> .net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN),
> };
> - int pos = cb->args[2];
> + int pos;
>
> /* eps hashtable dumps
> * args:
> * 0 : if it will traversal listen sock
> * 1 : to record the sock pos of this time's traversal
> - * 4 : to work as a temporary variable to traversal list
> */
> if (cb->args[0] == 0) {
> - if (!(idiag_states & TCPF_LISTEN))
> - goto skip;
> - if (sctp_for_each_endpoint(sctp_ep_dump, &commp))
> - goto done;
> -skip:
> + if (idiag_states & TCPF_LISTEN) {
> + pos = cb->args[1];
> + if (sctp_for_each_endpoint(sctp_ep_dump, net, &pos,
> + &commp)) {
> + cb->args[1] = pos;
> + return;
> + }
> + }
> cb->args[0] = 1;
> cb->args[1] = 0;
> - cb->args[4] = 0;
> }
>
> + if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
> + return;
> +
> /* asocs by transport hashtable dump
> * args:
> * 1 : to record the assoc pos of this time's traversal
> * 2 : to record the transport pos of this time's traversal
> * 3 : to mark if we have dumped the ep info of the current asoc
> * 4 : to work as a temporary variable to traversal list
> - * 5 : to save the sk we get from travelsing the tsp list.
> */
> - if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
> - goto done;
> -
> + pos = cb->args[2];
> sctp_transport_traverse_process(sctp_sock_filter, sctp_sock_dump,
> net, &pos, &commp);
> cb->args[2] = pos;
> -
> -done:
> cb->args[1] = cb->args[4];
> cb->args[4] = 0;
> }
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 66e12fb0c646..1ed405dedc01 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -5369,24 +5369,44 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net,
> }
>
> int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> - void *p) {
> - int err = 0;
> - int hash = 0;
> - struct sctp_endpoint *ep;
> + struct net *net, int *pos, void *p) {
> + int err, hash = 0, idx = 0, start;
> struct sctp_hashbucket *head;
> + struct sctp_endpoint *ep;
> + struct sock *sk;
>
> for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
> hash++, head++) {
> + start = idx;
> +again:
> + sk = NULL;
> read_lock_bh(&head->lock);
> sctp_for_each_hentry(ep, &head->chain) {
> - err = cb(ep, p);
> - if (err)
> + if (sock_net(ep->base.sk) != net)
> + continue;
> + if (idx++ >= *pos) {
> + sk = ep->base.sk;
> + sock_hold(sk);
> break;
> + }
> }
> read_unlock_bh(&head->lock);
> +
> + if (sk) {
> + err = cb(ep, p);
> + if (err) {
> + sock_put(sk);
> + return err;
> + }
> + sock_put(sk);
> + (*pos)++;
> +
> + idx = start;
> + goto again;
> + }
> }
>
> - return err;
> + return 0;
> }
> EXPORT_SYMBOL_GPL(sctp_for_each_endpoint);
>
> --
> 2.47.1
prev parent reply other threads:[~2026-06-13 7:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 17:59 [PATCH net] sctp: hold socket lock when dumping endpoints in sctp_diag Xin Long
2026-06-13 7:10 ` Willy Tarreau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ai0CcdDCkhBO_RF2@1wt.eu \
--to=w@1wt.eu \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-sctp@vger.kernel.org \
--cc=lucien.xin@gmail.com \
--cc=marcelo.leitner@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=zdi-disclosures@trendmicro.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox