* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
[not found] ` <20250517035120.55560-5-kuniyu@amazon.com>
@ 2025-05-22 8:55 ` Paolo Abeni
2025-05-22 16:12 ` Kuniyuki Iwashima
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-05-22 8:55 UTC (permalink / raw)
To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, netdev, Chuck Lever, Jeff Layton,
linux-nfs, Keith Busch, Jens Axboe, Christoph Hellwig,
Wenjia Zhang, Jan Karcher, Steve French,
linux-rdma@vger.kernel.org, linux-nvme, Matthieu Baerts,
MPTCP Linux
On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
> count the netns of kernel sockets."), TCP kernel socket has caused
> many UAF.
>
> We have converted such sockets to hold netns refcnt, and we have
> the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
>
> __sock_create_kern(..., &sock);
> sk_net_refcnt_upgrade(sock->sk);
>
> Let's drop the conversion and use sock_create_kern() instead.
>
> The changes for cifs, mptcp, nvme, and smc are straightforward.
>
> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> call __sock_create_kern() for others.
>
> For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
> sockets.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
This LGTM, but is touching a few other subsystems, it would be great to
collect acks from the relevant maintainers: I'm adding a few CCs.
Direct link to the series:
https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t
> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index 37a2ba38f10e..c7b4f5a7cca1 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -3348,21 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
> socket = server->ssocket;
> } else {
> struct net *net = cifs_net_ns(server);
> - struct sock *sk;
>
> - rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
> - IPPROTO_TCP, &server->ssocket);
> + rc = sock_create_kern(net, sfamily, SOCK_STREAM,
> + IPPROTO_TCP, &server->ssocket);
> if (rc < 0) {
> cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
> return rc;
> }
>
> - sk = server->ssocket->sk;
> - __netns_tracker_free(net, &sk->ns_tracker, false);
> - sk->sk_net_refcnt = 1;
> - get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> - sock_inuse_add(net, 1);
AFAICS the above implicitly adds a missing net_passive_dec(net), which
in turns looks like a separate bugfix. What about adding a separate
patch introducing that line? Could be in the same series to simplify the
processing.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
2025-05-22 8:55 ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
@ 2025-05-22 16:12 ` Kuniyuki Iwashima
2025-05-22 16:38 ` Chuck Lever
2025-05-23 4:23 ` Christoph Hellwig
2 siblings, 0 replies; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-22 16:12 UTC (permalink / raw)
To: pabeni
Cc: axboe, chuck.lever, davem, edumazet, hch, horms, jaka, jlayton,
kbusch, kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, sfrench, wenjia, willemb
From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 22 May 2025 10:55:47 +0200
> On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
> > Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
> > count the netns of kernel sockets."), TCP kernel socket has caused
> > many UAF.
> >
> > We have converted such sockets to hold netns refcnt, and we have
> > the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
> >
> > __sock_create_kern(..., &sock);
> > sk_net_refcnt_upgrade(sock->sk);
> >
> > Let's drop the conversion and use sock_create_kern() instead.
> >
> > The changes for cifs, mptcp, nvme, and smc are straightforward.
> >
> > For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> > call __sock_create_kern() for others.
> >
> > For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
> > sockets.
> >
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>
> This LGTM, but is touching a few other subsystems, it would be great to
> collect acks from the relevant maintainers: I'm adding a few CCs.
>
> Direct link to the series:
>
> https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t
>
> > diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> > index 37a2ba38f10e..c7b4f5a7cca1 100644
> > --- a/fs/smb/client/connect.c
> > +++ b/fs/smb/client/connect.c
> > @@ -3348,21 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
> > socket = server->ssocket;
> > } else {
> > struct net *net = cifs_net_ns(server);
> > - struct sock *sk;
> >
> > - rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
> > - IPPROTO_TCP, &server->ssocket);
> > + rc = sock_create_kern(net, sfamily, SOCK_STREAM,
> > + IPPROTO_TCP, &server->ssocket);
> > if (rc < 0) {
> > cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
> > return rc;
> > }
> >
> > - sk = server->ssocket->sk;
> > - __netns_tracker_free(net, &sk->ns_tracker, false);
> > - sk->sk_net_refcnt = 1;
> > - get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> > - sock_inuse_add(net, 1);
>
> AFAICS the above implicitly adds a missing net_passive_dec(net), which
> in turns looks like a separate bugfix. What about adding a separate
> patch introducing that line? Could be in the same series to simplify the
> processing.
Thanks for catching!
Will add this patch before this change.
---8<---
commit c7ff005fe4d930169f319aca0bd9577541cd7459 (HEAD)
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Thu May 22 16:03:29 2025 +0000
smb: client: Add missing net_passive_dec().
While reverting commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock
after rmmod"), I should have added net_passive_dec(), which was added between
the original commit and the revert by commit 5c70eb5c593d ("net: better track
kernel sockets lifetime").
Let's call net_passive_dec() in generic_ip_connect().
Note that this commit is only needed for 6.14+.
Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 37a2ba38f10e..afac23a5a3ec 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3359,6 +3359,7 @@ generic_ip_connect(struct TCP_Server_Info *server)
sk = server->ssocket->sk;
__netns_tracker_free(net, &sk->ns_tracker, false);
+ net_passive_dec(net);
sk->sk_net_refcnt = 1;
get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
sock_inuse_add(net, 1);
---8<---
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
2025-05-22 8:55 ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
2025-05-22 16:12 ` Kuniyuki Iwashima
@ 2025-05-22 16:38 ` Chuck Lever
2025-05-22 17:04 ` Kuniyuki Iwashima
2025-05-23 4:23 ` Christoph Hellwig
2 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-05-22 16:38 UTC (permalink / raw)
To: Paolo Abeni, Kuniyuki Iwashima, David S. Miller, Eric Dumazet,
Jakub Kicinski, Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, netdev, Jeff Layton, linux-nfs,
Keith Busch, Jens Axboe, Christoph Hellwig, Wenjia Zhang,
Jan Karcher, Steve French, linux-rdma@vger.kernel.org, linux-nvme,
Matthieu Baerts, MPTCP Linux
On 5/22/25 4:55 AM, Paolo Abeni wrote:
> On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
>> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
>> count the netns of kernel sockets."), TCP kernel socket has caused
>> many UAF.
>>
>> We have converted such sockets to hold netns refcnt, and we have
>> the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
>>
>> __sock_create_kern(..., &sock);
>> sk_net_refcnt_upgrade(sock->sk);
>>
>> Let's drop the conversion and use sock_create_kern() instead.
>>
>> The changes for cifs, mptcp, nvme, and smc are straightforward.
>>
>> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
>> call __sock_create_kern() for others.
>>
>> For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
>> sockets.
>>
>> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>
> This LGTM, but is touching a few other subsystems, it would be great to
> collect acks from the relevant maintainers: I'm adding a few CCs.
>
> Direct link to the series:
>
> https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t
Thank you, Paolo, for forwarding this series.
For all hunks modifying net/sunrpc/svcsock.c and
net/handshake/handshake-test.c:
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Regarding patch 4/6:
This paragraph in the patch description needs to explain /why/ sunrpc
is an exception:
> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> call __sock_create_kern() for others.
The below hunk doesn't seem related to the marquee purpose of this
series. Should it be a separate patch with its own rationale?
@@ -1541,8 +1544,8 @@ static struct svc_xprt *svc_create_socket(struct
svc_serv *serv,
newlen = error;
if (protocol == IPPROTO_TCP) {
- sk_net_refcnt_upgrade(sock->sk);
- if ((error = kernel_listen(sock, 64)) < 0)
+ error = kernel_listen(sock, 64);
+ if (error < 0)
goto bummer;
}
>> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
>> index 37a2ba38f10e..c7b4f5a7cca1 100644
>> --- a/fs/smb/client/connect.c
>> +++ b/fs/smb/client/connect.c
>> @@ -3348,21 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
>> socket = server->ssocket;
>> } else {
>> struct net *net = cifs_net_ns(server);
>> - struct sock *sk;
>>
>> - rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
>> - IPPROTO_TCP, &server->ssocket);
>> + rc = sock_create_kern(net, sfamily, SOCK_STREAM,
>> + IPPROTO_TCP, &server->ssocket);
>> if (rc < 0) {
>> cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
>> return rc;
>> }
>>
>> - sk = server->ssocket->sk;
>> - __netns_tracker_free(net, &sk->ns_tracker, false);
>> - sk->sk_net_refcnt = 1;
>> - get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
>> - sock_inuse_add(net, 1);
>
> AFAICS the above implicitly adds a missing net_passive_dec(net), which
> in turns looks like a separate bugfix. What about adding a separate
> patch introducing that line? Could be in the same series to simplify the
> processing.
>
> Thanks,
>
> Paolo
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
2025-05-22 16:38 ` Chuck Lever
@ 2025-05-22 17:04 ` Kuniyuki Iwashima
0 siblings, 0 replies; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-22 17:04 UTC (permalink / raw)
To: chuck.lever
Cc: axboe, davem, edumazet, hch, horms, jaka, jlayton, kbusch, kuba,
kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Chuck Lever <chuck.lever@oracle.com>
Date: Thu, 22 May 2025 12:38:03 -0400
> On 5/22/25 4:55 AM, Paolo Abeni wrote:
> > On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
> >> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
> >> count the netns of kernel sockets."), TCP kernel socket has caused
> >> many UAF.
> >>
> >> We have converted such sockets to hold netns refcnt, and we have
> >> the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
> >>
> >> __sock_create_kern(..., &sock);
> >> sk_net_refcnt_upgrade(sock->sk);
> >>
> >> Let's drop the conversion and use sock_create_kern() instead.
> >>
> >> The changes for cifs, mptcp, nvme, and smc are straightforward.
> >>
> >> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> >> call __sock_create_kern() for others.
> >>
> >> For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
> >> sockets.
> >>
> >> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> >
> > This LGTM, but is touching a few other subsystems, it would be great to
> > collect acks from the relevant maintainers: I'm adding a few CCs.
> >
> > Direct link to the series:
> >
> > https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t
>
> Thank you, Paolo, for forwarding this series.
>
> For all hunks modifying net/sunrpc/svcsock.c and
> net/handshake/handshake-test.c:
>
> Acked-by: Chuck Lever <chuck.lever@oracle.com>
>
> Regarding patch 4/6:
>
> This paragraph in the patch description needs to explain /why/ sunrpc
> is an exception:
>
> > For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> > call __sock_create_kern() for others.
Sorry I noticed this sentence was not updated from the previous series.
I'll change it as follows
For sunrpc, we call sk_net_refcnt_upgrade() for IPPROTO_TCP only
so we use sock_create_kern() for TCP and keep __sock_create_kern()
for others.
>
> The below hunk doesn't seem related to the marquee purpose of this
> series. Should it be a separate patch with its own rationale?
>
> @@ -1541,8 +1544,8 @@ static struct svc_xprt *svc_create_socket(struct
> svc_serv *serv,
> newlen = error;
>
> if (protocol == IPPROTO_TCP) {
> - sk_net_refcnt_upgrade(sock->sk);
The part above is related, and the below is not, using the old
style warned by checkpatch, so I cleaned it up while at it but
didn't think it's worth a patch. I'm fine to drop it.
> - if ((error = kernel_listen(sock, 64)) < 0)
> + error = kernel_listen(sock, 64);
> + if (error < 0)
> goto bummer;
> }
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.
2025-05-22 8:55 ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
2025-05-22 16:12 ` Kuniyuki Iwashima
2025-05-22 16:38 ` Chuck Lever
@ 2025-05-23 4:23 ` Christoph Hellwig
2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-23 4:23 UTC (permalink / raw)
To: Paolo Abeni
Cc: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, netdev,
Chuck Lever, Jeff Layton, linux-nfs, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
linux-rdma@vger.kernel.org, linux-nvme, Matthieu Baerts,
MPTCP Linux
On Thu, May 22, 2025 at 10:55:47AM +0200, Paolo Abeni wrote:
> https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t
Kuniyuki, please resend this with linux-nvme included to allow for
easy reviewing, for which lore unfortunately is not very useful.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-23 4:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250517035120.55560-1-kuniyu@amazon.com>
[not found] ` <20250517035120.55560-5-kuniyu@amazon.com>
2025-05-22 8:55 ` [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/ Paolo Abeni
2025-05-22 16:12 ` Kuniyuki Iwashima
2025-05-22 16:38 ` Chuck Lever
2025-05-22 17:04 ` Kuniyuki Iwashima
2025-05-23 4:23 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox