public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
       [not found] ` <20250517035120.55560-5-kuniyu@amazon.com>
@ 2025-05-22  8:55   ` Paolo Abeni
  2025-05-22 16:12     ` Kuniyuki Iwashima
                       ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-05-22  8:55 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Willem de Bruijn
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, Chuck Lever, Jeff Layton,
	linux-nfs, Keith Busch, Jens Axboe, Christoph Hellwig,
	Wenjia Zhang, Jan Karcher, Steve French,
	linux-rdma@vger.kernel.org, linux-nvme, Matthieu Baerts,
	MPTCP Linux

On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
> count the netns of kernel sockets."), TCP kernel socket has caused
> many UAF.
> 
> We have converted such sockets to hold netns refcnt, and we have
> the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
> 
>   __sock_create_kern(..., &sock);
>   sk_net_refcnt_upgrade(sock->sk);
> 
> Let's drop the conversion and use sock_create_kern() instead.
> 
> The changes for cifs, mptcp, nvme, and smc are straightforward.
> 
> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> call __sock_create_kern() for others.
> 
> For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
> sockets.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

This LGTM, but is touching a few other subsystems, it would be great to
collect acks from the relevant maintainers: I'm adding a few CCs.

Direct link to the series:

https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t

> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index 37a2ba38f10e..c7b4f5a7cca1 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -3348,21 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
>  		socket = server->ssocket;
>  	} else {
>  		struct net *net = cifs_net_ns(server);
> -		struct sock *sk;
>  
> -		rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
> -					IPPROTO_TCP, &server->ssocket);
> +		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
> +				      IPPROTO_TCP, &server->ssocket);
>  		if (rc < 0) {
>  			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
>  			return rc;
>  		}
>  
> -		sk = server->ssocket->sk;
> -		__netns_tracker_free(net, &sk->ns_tracker, false);
> -		sk->sk_net_refcnt = 1;
> -		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);

AFAICS the above implicitly adds a missing net_passive_dec(net), which
in turns looks like a separate bugfix. What about adding a separate
patch introducing that line? Could be in the same series to simplify the
processing.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
  2025-05-22  8:55   ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
@ 2025-05-22 16:12     ` Kuniyuki Iwashima
  2025-05-22 16:38     ` Chuck Lever
  2025-05-23  4:23     ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-22 16:12 UTC (permalink / raw)
  To: pabeni
  Cc: axboe, chuck.lever, davem, edumazet, hch, horms, jaka, jlayton,
	kbusch, kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
	matttbe, mptcp, netdev, sfrench, wenjia, willemb

From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 22 May 2025 10:55:47 +0200
> On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
> > Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
> > count the netns of kernel sockets."), TCP kernel socket has caused
> > many UAF.
> > 
> > We have converted such sockets to hold netns refcnt, and we have
> > the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
> > 
> >   __sock_create_kern(..., &sock);
> >   sk_net_refcnt_upgrade(sock->sk);
> > 
> > Let's drop the conversion and use sock_create_kern() instead.
> > 
> > The changes for cifs, mptcp, nvme, and smc are straightforward.
> > 
> > For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> > call __sock_create_kern() for others.
> > 
> > For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
> > sockets.
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> 
> This LGTM, but is touching a few other subsystems, it would be great to
> collect acks from the relevant maintainers: I'm adding a few CCs.
> 
> Direct link to the series:
> 
> https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t
> 
> > diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> > index 37a2ba38f10e..c7b4f5a7cca1 100644
> > --- a/fs/smb/client/connect.c
> > +++ b/fs/smb/client/connect.c
> > @@ -3348,21 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
> >  		socket = server->ssocket;
> >  	} else {
> >  		struct net *net = cifs_net_ns(server);
> > -		struct sock *sk;
> >  
> > -		rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
> > -					IPPROTO_TCP, &server->ssocket);
> > +		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
> > +				      IPPROTO_TCP, &server->ssocket);
> >  		if (rc < 0) {
> >  			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
> >  			return rc;
> >  		}
> >  
> > -		sk = server->ssocket->sk;
> > -		__netns_tracker_free(net, &sk->ns_tracker, false);
> > -		sk->sk_net_refcnt = 1;
> > -		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> > -		sock_inuse_add(net, 1);
> 
> AFAICS the above implicitly adds a missing net_passive_dec(net), which
> in turns looks like a separate bugfix. What about adding a separate
> patch introducing that line? Could be in the same series to simplify the
> processing.

Thanks for catching!

Will add this patch before this change.

---8<---
commit c7ff005fe4d930169f319aca0bd9577541cd7459 (HEAD)
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Thu May 22 16:03:29 2025 +0000

    smb: client: Add missing net_passive_dec().
    
    While reverting commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock
    after rmmod"), I should have added net_passive_dec(), which was added between
    the original commit and the revert by commit 5c70eb5c593d ("net: better track
    kernel sockets lifetime").
    
    Let's call net_passive_dec() in generic_ip_connect().
    
    Note that this commit is only needed for 6.14+.
    
    Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
    Cc: <stable@vger.kernel.org> # 6.14.x
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 37a2ba38f10e..afac23a5a3ec 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3359,6 +3359,7 @@ generic_ip_connect(struct TCP_Server_Info *server)
 
                sk = server->ssocket->sk;
                __netns_tracker_free(net, &sk->ns_tracker, false);
+               net_passive_dec(net);
                sk->sk_net_refcnt = 1;
                get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
                sock_inuse_add(net, 1);
---8<---

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
  2025-05-22  8:55   ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
  2025-05-22 16:12     ` Kuniyuki Iwashima
@ 2025-05-22 16:38     ` Chuck Lever
  2025-05-22 17:04       ` Kuniyuki Iwashima
  2025-05-23  4:23     ` Christoph Hellwig
  2 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-05-22 16:38 UTC (permalink / raw)
  To: Paolo Abeni, Kuniyuki Iwashima, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Willem de Bruijn
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, Jeff Layton, linux-nfs,
	Keith Busch, Jens Axboe, Christoph Hellwig, Wenjia Zhang,
	Jan Karcher, Steve French, linux-rdma@vger.kernel.org, linux-nvme,
	Matthieu Baerts, MPTCP Linux

On 5/22/25 4:55 AM, Paolo Abeni wrote:
> On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
>> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
>> count the netns of kernel sockets."), TCP kernel socket has caused
>> many UAF.
>>
>> We have converted such sockets to hold netns refcnt, and we have
>> the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
>>
>>   __sock_create_kern(..., &sock);
>>   sk_net_refcnt_upgrade(sock->sk);
>>
>> Let's drop the conversion and use sock_create_kern() instead.
>>
>> The changes for cifs, mptcp, nvme, and smc are straightforward.
>>
>> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
>> call __sock_create_kern() for others.
>>
>> For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
>> sockets.
>>
>> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> 
> This LGTM, but is touching a few other subsystems, it would be great to
> collect acks from the relevant maintainers: I'm adding a few CCs.
> 
> Direct link to the series:
> 
> https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t

Thank you, Paolo, for forwarding this series.

For all hunks modifying net/sunrpc/svcsock.c and
net/handshake/handshake-test.c:

  Acked-by: Chuck Lever <chuck.lever@oracle.com>

Regarding patch 4/6:

This paragraph in the patch description needs to explain /why/ sunrpc
is an exception:

> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> call __sock_create_kern() for others.

The below hunk doesn't seem related to the marquee purpose of this
series. Should it be a separate patch with its own rationale?

@@ -1541,8 +1544,8 @@ static struct svc_xprt *svc_create_socket(struct
svc_serv *serv,
 	newlen = error;

 	if (protocol == IPPROTO_TCP) {
-		sk_net_refcnt_upgrade(sock->sk);
-		if ((error = kernel_listen(sock, 64)) < 0)
+		error = kernel_listen(sock, 64);
+		if (error < 0)
 			goto bummer;
 	}


>> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
>> index 37a2ba38f10e..c7b4f5a7cca1 100644
>> --- a/fs/smb/client/connect.c
>> +++ b/fs/smb/client/connect.c
>> @@ -3348,21 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
>>  		socket = server->ssocket;
>>  	} else {
>>  		struct net *net = cifs_net_ns(server);
>> -		struct sock *sk;
>>  
>> -		rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
>> -					IPPROTO_TCP, &server->ssocket);
>> +		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
>> +				      IPPROTO_TCP, &server->ssocket);
>>  		if (rc < 0) {
>>  			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
>>  			return rc;
>>  		}
>>  
>> -		sk = server->ssocket->sk;
>> -		__netns_tracker_free(net, &sk->ns_tracker, false);
>> -		sk->sk_net_refcnt = 1;
>> -		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
>> -		sock_inuse_add(net, 1);
> 
> AFAICS the above implicitly adds a missing net_passive_dec(net), which
> in turns looks like a separate bugfix. What about adding a separate
> patch introducing that line? Could be in the same series to simplify the
> processing.
> 
> Thanks,
> 
> Paolo
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
  2025-05-22 16:38     ` Chuck Lever
@ 2025-05-22 17:04       ` Kuniyuki Iwashima
  0 siblings, 0 replies; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-22 17:04 UTC (permalink / raw)
  To: chuck.lever
  Cc: axboe, davem, edumazet, hch, horms, jaka, jlayton, kbusch, kuba,
	kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
	mptcp, netdev, pabeni, sfrench, wenjia, willemb

From: Chuck Lever <chuck.lever@oracle.com>
Date: Thu, 22 May 2025 12:38:03 -0400
> On 5/22/25 4:55 AM, Paolo Abeni wrote:
> > On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
> >> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
> >> count the netns of kernel sockets."), TCP kernel socket has caused
> >> many UAF.
> >>
> >> We have converted such sockets to hold netns refcnt, and we have
> >> the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
> >>
> >>   __sock_create_kern(..., &sock);
> >>   sk_net_refcnt_upgrade(sock->sk);
> >>
> >> Let's drop the conversion and use sock_create_kern() instead.
> >>
> >> The changes for cifs, mptcp, nvme, and smc are straightforward.
> >>
> >> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> >> call __sock_create_kern() for others.
> >>
> >> For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
> >> sockets.
> >>
> >> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > 
> > This LGTM, but is touching a few other subsystems, it would be great to
> > collect acks from the relevant maintainers: I'm adding a few CCs.
> > 
> > Direct link to the series:
> > 
> > https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t
> 
> Thank you, Paolo, for forwarding this series.
> 
> For all hunks modifying net/sunrpc/svcsock.c and
> net/handshake/handshake-test.c:
> 
>   Acked-by: Chuck Lever <chuck.lever@oracle.com>
> 
> Regarding patch 4/6:
> 
> This paragraph in the patch description needs to explain /why/ sunrpc
> is an exception:
> 
> > For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> > call __sock_create_kern() for others.

Sorry I noticed this sentence was not updated from the previous series.

I'll change it as follows

    For sunrpc, we call sk_net_refcnt_upgrade() for IPPROTO_TCP only
    so we use sock_create_kern() for TCP and keep __sock_create_kern()
    for others.


> 
> The below hunk doesn't seem related to the marquee purpose of this
> series. Should it be a separate patch with its own rationale?
> 
> @@ -1541,8 +1544,8 @@ static struct svc_xprt *svc_create_socket(struct
> svc_serv *serv,
>  	newlen = error;
> 
>  	if (protocol == IPPROTO_TCP) {
> -		sk_net_refcnt_upgrade(sock->sk);

The part above is related, and the below is not, using the old
style warned by checkpatch, so I cleaned it up while at it but
didn't think it's worth a patch.  I'm fine to drop it.


> -		if ((error = kernel_listen(sock, 64)) < 0)
> +		error = kernel_listen(sock, 64);
> +		if (error < 0)
>  			goto bummer;
>  	}
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
  2025-05-22  8:55   ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
  2025-05-22 16:12     ` Kuniyuki Iwashima
  2025-05-22 16:38     ` Chuck Lever
@ 2025-05-23  4:23     ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-23  4:23 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, netdev,
	Chuck Lever, Jeff Layton, linux-nfs, Keith Busch, Jens Axboe,
	Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
	linux-rdma@vger.kernel.org, linux-nvme, Matthieu Baerts,
	MPTCP Linux

On Thu, May 22, 2025 at 10:55:47AM +0200, Paolo Abeni wrote:
> https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t

Kuniyuki, please resend this with linux-nvme included to allow for
easy reviewing, for which lore unfortunately is not very useful.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-05-23  4:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250517035120.55560-1-kuniyu@amazon.com>
     [not found] ` <20250517035120.55560-5-kuniyu@amazon.com>
2025-05-22  8:55   ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
2025-05-22 16:12     ` Kuniyuki Iwashima
2025-05-22 16:38     ` Chuck Lever
2025-05-22 17:04       ` Kuniyuki Iwashima
2025-05-23  4:23     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox