public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Allison Henderson <allison.henderson@oracle.com>
To: "horms@kernel.org" <horms@kernel.org>,
	"achender@kernel.org" <achender@kernel.org>
Cc: "pabeni@redhat.com" <pabeni@redhat.com>,
	"rds-devel@oss.oracle.com" <rds-devel@oss.oracle.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [net-next,v5,1/2] net/rds: Refactor __rds_conn_create for blocking transport cleanup
Date: Wed, 25 Feb 2026 06:44:57 +0000	[thread overview]
Message-ID: <8d2d61150cc4f8ba6729bb4786a28817aad2c15d.camel@oracle.com> (raw)
In-Reply-To: <20260224161607.143920-1-horms@kernel.org>

On Tue, 2026-02-24 at 16:16 +0000, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> 
> For local reproduction steps see https://urldefense.com/v3/__https://netdev-ai.bots.linux.dev/ai-local.html__;!!ACWV5N9M2RV99hQ!Ki9pK6KFAOgmcbdBBDA6lG7dXM_7sJiBAcp2aRSeDi8bRG9ACJajfiLdK-YgNvXyOPqrJpC5y7k9fq6huS8$ 
> ---
> net/rds: Refactor __rds_conn_create for blocking transport cleanup
> 
> The next patch will delegate fanout operations to a background worker,
> which adds cancel_work_sync() to rds_tcp_conn_free(). This is called
> during a connection cleanup and requires an operations to be blocking.
> However, the error path of __rds_conn_create() currently calls
> trans->conn_free() while holding rds_conn_lock (spinlock) and
> rcu_read_lock, which creates an atomic context where cancel_work_sync()
> cannot sleep.
> 
> Link: https://urldefense.com/v3/__https://ci.syzbot.org/series/1a5ef180-c02c-401d-9df7-670b18570a55__;!!ACWV5N9M2RV99hQ!Ki9pK6KFAOgmcbdBBDA6lG7dXM_7sJiBAcp2aRSeDi8bRG9ACJajfiLdK-YgNvXyOPqrJpC5y7k9noDt20w$ 
> 
> > diff --git a/net/rds/connection.c b/net/rds/connection.c
> > --- a/net/rds/connection.c
> > +++ b/net/rds/connection.c
> > @@ -170,6 +170,7 @@ static struct rds_connection *__rds_conn_create(struct net *net,
> >  	struct hlist_head *head = rds_conn_bucket(laddr, faddr);
> >  	struct rds_transport *loop_trans;
> >  	struct rds_conn_path *free_cp = NULL;
> > +	struct rds_transport *free_trans = NULL;
> >  	unsigned long flags;
> >  	int ret, i;
> >  	int npaths = (trans->t_mp_capable ? RDS_MPATH_WORKERS : 1);
> > @@ -305,7 +306,7 @@
> >  	if (parent) {
> >  		/* Creating passive conn */
> >  		if (parent->c_passive) {
> > -			trans->conn_free(conn->c_path[0].cp_transport_data);
> > +			free_trans = trans;
> >  			free_cp = conn->c_path;
> >  			kmem_cache_free(rds_conn_slab, conn);
> >  			conn = parent->c_passive;
> 
> [ ... ]
> 
> > @@ -321,18 +322,7 @@
> >  		found = rds_conn_lookup(net, head, laddr, faddr, trans,
> >  					tos, dev_if);
> >  		if (found) {
> > -			struct rds_conn_path *cp;
> > -			int i;
> > -
> > -			for (i = 0; i < npaths; i++) {
> > -				cp = &conn->c_path[i];
> > -				/* The ->conn_alloc invocation may have
> > -				 * allocated resource for all paths, so all
> > -				 * of them may have to be freed here.
> > -				 */
> > -				if (cp->cp_transport_data)
> > -					trans->conn_free(cp->cp_transport_data);
> > -			}
> > +			free_trans = trans;
> >  			free_cp = conn->c_path;
> >  			kmem_cache_free(rds_conn_slab, conn);
> >  			conn = found;
> 
> Does this create a use-after-free in other threads that iterate the
> transport list?
> 
> Looking at the race-loss path when rds_conn_lookup() finds an existing
> connection, the old code called trans->conn_free() before
> kmem_cache_free(conn). Now the patch defers conn_free() to after
> locks are released at the out: label, but kmem_cache_free(conn) still
> executes inside the critical section (line 327 after this change).
> 
> This creates a window where rds_tcp_connection objects remain on
> rds_tcp_conn_list with tc->t_cpath->cp_conn pointing to the
> already-freed conn object.
> 
> The timeline would be:
> 
> CPU 0 (__rds_conn_create race-loss)
>   spin_lock_irqsave(&rds_conn_lock)
>   found = rds_conn_lookup()  // finds existing conn
>   free_trans = trans
>   free_cp = conn->c_path
>   kmem_cache_free(rds_conn_slab, conn)  // conn freed
>   conn = found
>   spin_unlock_irqrestore(&rds_conn_lock)
>   rcu_read_unlock()
>   // ... later at out: label
>   free_trans->conn_free(...)  // removes tc from list, but too late
> 
> CPU 1 (rds_tcp_kill_sock)
>   spin_lock_irq(&rds_tcp_conn_lock)
>   list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node)
>     c_net = read_pnet(&tc->t_cpath->cp_conn->c_net)  // UAF here!
> 
> The commit message states "connections that 'lose' the race are safe to
> clean up outside the critical section since they were never added to
> the hashtable, and therefore are inaccessible to other threads."
> 
> However, while the rds_connection was never added to the RDS hashtable,
> the transport data (tc) WAS added to rds_tcp_conn_list during
> rds_tcp_conn_alloc(). Looking at that function in net/rds/tcp.c:
> 
> rds_tcp_conn_alloc() {
>     // ... allocate tc structures ...
>     spin_lock_irq(&rds_tcp_conn_lock);
>     for (i = 0; i < RDS_MPATH_WORKERS; i++) {
>         tc = conn->c_path[i].cp_transport_data;
>         tc->t_tcp_node_detached = false;
>         list_add_tail(&tc->t_tcp_node, &rds_tcp_conn_list);
>     }
>     spin_unlock_irq(&rds_tcp_conn_lock);
> }
> 
> So the tc objects are accessible to other threads via the transport
> list, even though the conn itself never made it into the RDS hashtable.
> 
> Functions that iterate rds_tcp_conn_list include:
> - rds_tcp_kill_sock() at net/rds/tcp.c:610
> - rds_tcp_destroy_conns()
> - rds_tcp_sysctl_reset()
> 
> All of these dereference tc->t_cpath->cp_conn during iteration.
> 
> In the old code, conn_free() ran BEFORE kmem_cache_free(conn), so the tc
> was removed from the list before conn was freed. No UAF window existed.
> 
> Could the kmem_cache_free(rds_conn_slab, conn) be moved to after
> conn_free() at the out: label, or should conn_free() be called before
> kmem_cache_free() inside the critical section for the race-loss paths?
I think it probably needs to go after the the out label.  Will update.  Thank you!

Allison

> 
> > @@ -349,9 +339,23 @@
> > 
> >  out:
> >  	if (free_cp) {
> > -		for (i = 0; i < npaths; i++)
> > +		for (i = 0; i < npaths; i++) {
> > +			/*
> > +			 * The trans->conn_alloc call may have allocated
> > +			 * resources for the cp paths, which will need to
> > +			 * be freed before freeing cp itself.  We do this here
> > +			 * after the out: label so it runs outside the
> > +			 * rds_conn_lock spinlock and rcu_read_lock section,
> > +			 * since both destroy_workqueue() and conn_free can
> > +			 * block. The local free_trans pointer is only set in
> > +			 * the race-loss paths where conn_alloc() succeeded.
> > +			 */
> > +			if (free_trans && free_cp[i].cp_transport_data)
> > +				free_trans->conn_free
> > +					(free_cp[i].cp_transport_data);
> >  			if (free_cp[i].cp_wq != rds_wq)
> >  				destroy_workqueue(free_cp[i].cp_wq);
> > +		}
> >  		kfree(free_cp);
> >  	}

  reply	other threads:[~2026-02-25  6:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 22:19 [PATCH net-next v5 0/2] net/rds: RDS-TCP reconnect and fanout improvements Allison Henderson
2026-02-23 22:19 ` [PATCH net-next v5 1/2] net/rds: Refactor __rds_conn_create for blocking transport cleanup Allison Henderson
2026-02-24 16:16   ` [net-next,v5,1/2] " Simon Horman
2026-02-25  6:44     ` Allison Henderson [this message]
2026-02-23 22:19 ` [PATCH net-next v5 2/2] net/rds: Delegate fan-out to a background worker Allison Henderson
2026-02-25 16:33   ` Fernando Fernandez Mancera
2026-02-26 17:06     ` Allison Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d2d61150cc4f8ba6729bb4786a28817aad2c15d.camel@oracle.com \
    --to=allison.henderson@oracle.com \
    --cc=achender@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rds-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox