From: Kuniyuki Iwashima <kuniyu@google.com>
To: pabeni@redhat.com
Cc: ap420073@gmail.com, davem@davemloft.net, dsahern@kernel.org,
edumazet@google.com, horms@kernel.org, jiayuan.chen@linux.dev,
jiayuan.chen@shopee.com, josef@toxicpanda.com, kuba@kernel.org,
linux-kernel@vger.kernel.org, nbd@other.debian.org,
netdev@vger.kernel.org,
syzbot+afbcf622635e98bf40d2@syzkaller.appspotmail.com
Subject: Re: [PATCH net v1] net/ipv6: mcast: fix circular locking dependency in __ipv6_dev_mc_inc()
Date: Mon, 23 Mar 2026 06:54:05 +0000 [thread overview]
Message-ID: <20260323065417.710353-1-kuniyu@google.com> (raw)
In-Reply-To: <258f99ac-bd34-4d14-8271-1266b9aba6f8@redhat.com>
From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 19 Mar 2026 13:44:00 +0100
> Adding Josef.
>
> On 3/19/26 4:26 AM, Jakub Kicinski wrote:
> > On Thu, 19 Mar 2026 11:04:24 +0800 Jiayuan Chen wrote:
> >>>> Split mca_alloc() into mca_alloc() + mca_init(): mca_alloc() does the
> >>>> GFP_KERNEL allocation before mc_lock, mca_init() initializes under
> >>>> mc_lock. If the address already exists, the pre-allocated memory is
> >>>> simply freed. Also move inet6_ifmcaddr_notify() outside mc_lock since
> >>>> it also does GFP_KERNEL allocation.
> >>> Moving the allocation seems fine, but also having to move the
> >>> notification, potentially letting the notification go out of order
> >>> makes me wonder if we aren't better off adding helpers for taking this
> >>> lock which also call memalloc_noio_{save,restore} ?
> >> Yeah, using memalloc_noio helpers is simpler. I checked and there
> >> are about 18 places taking mc_lock, so having a common mc_lock()/mc_unlock()
> >> wrapper that does the noio save/restore covers them all (if necessary).
> >>
> >> The only thing that feels a bit odd is using memalloc_noio in the networking
> >> subsystem. It makes sense in block/fs to protect itself from recursion.
> >
> > Totally agree that it feels a bit odd that we have to worry about IO,
> > but unless we can figure out a way to prevent nbd sockets from getting
> > here all our solutions are dealing with noio in networking code :(
> > IMHO it's better to acknowledge this with the explicit memalloc_noio
> > so future developers don't break things again with a mis-placed
> > allocation.
>
> I think a problem here is that the nbd socket is still exposed to
> user-space, while in use by the block device.
Right, and this also prevents us from changing lockdep keys
(due to lock_sock_fast()).
> I fear that the more
> syzkaller will learn new tricks, the more we will have to had strange
> noio all around the networking code.
I agree. We have 100+ unpublished reports for this variant, and
I started looking into them last week.
>
> I *think* we could prevent this kind of races with something alike the
> following:
> - nbd sets a DOIO sk flag on the sockets it uses.
> - the socket layer prevents socketopts()/ioctl() entirely on DOIO sk
NBD sets sk->sk_allocation, so if we use it around, there should
be no real race, but still lockdep would not be happy.
The problem is sendmsg() and shutdown() invoked from NBD (recmsg()
is okay which is not under tx_lock), and my idea is to implemnt the
trylock variant for lock_sock() and use it for TCP. ( AF_UNIX does
not have the race because it does not use lock_sock() in sendmsg()
and shutdown() )
sendmsg() will be retried with -ERESTARTSYS where needed thanks to
was_interrupted(). While sendmsg() for disconnect and shutdown()
could be missed, but probably not a big deal (the peer will get
RST when it send something) ?
If this looks good, I'll send a formal patch.
---8<---
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index fe63f3c55d0d..9003baf52be8 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -45,6 +45,8 @@
#include <linux/nbd.h>
#include <linux/nbd-netlink.h>
#include <net/genetlink.h>
+#include <net/tcp.h>
+#include <net/inet_common.h>
#define CREATE_TRACE_POINTS
#include <trace/events/nbd.h>
@@ -302,6 +304,19 @@ static int nbd_disconnected(struct nbd_config *config)
test_bit(NBD_RT_DISCONNECT_REQUESTED, &config->runtime_flags);
}
+static void nbd_sock_shutdown(struct sock *sk)
+{
+ if (sk_is_stream_unix(sk)) {
+ kernel_sock_shutdown(sk->sk_socket, SHUT_RDWR);
+ return;
+ }
+
+ if (lock_sock_try(sk)) {
+ inet_shutdown_locked(sk->sk_socket, SHUT_RDWR);
+ release_sock(sk);
+ }
+}
+
static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
int notify)
{
@@ -315,7 +330,8 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
}
}
if (!nsock->dead) {
- kernel_sock_shutdown(nsock->sock, SHUT_RDWR);
+ nbd_sock_shutdown(nsock->sock->sk);
+
if (atomic_dec_return(&nbd->config->live_connections) == 0) {
if (test_and_clear_bit(NBD_RT_DISCONNECT_REQUESTED,
&nbd->config->runtime_flags)) {
@@ -548,6 +564,22 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req)
return BLK_EH_DONE;
}
+static int nbd_sock_sendmsg(struct socket *sock, struct msghdr *msg)
+{
+ struct sock *sk = sock->sk;
+ int err = -ERESTARTSYS;
+
+ if (sk_is_stream_unix(sk))
+ return sock_sendmsg(sock, msg);
+
+ if (lock_sock_try(sk)) {
+ err = tcp_sendmsg_locked(sk, msg, msg_data_left(msg));
+ release_sock(sk);
+ }
+
+ return err;
+}
+
static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
struct iov_iter *iter, int msg_flags, int *sent)
{
@@ -573,7 +605,7 @@ static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
msg.msg_flags = msg_flags | MSG_NOSIGNAL;
if (send)
- result = sock_sendmsg(sock, &msg);
+ result = nbd_sock_sendmsg(sock, &msg);
else
result = sock_recvmsg(sock, &msg, msg.msg_flags);
@@ -1228,6 +1260,13 @@ static struct socket *nbd_get_socket(struct nbd_device *nbd, unsigned long fd,
return NULL;
}
+ if (READ_ONCE(sock->sk->sk_state) != TCP_ESTABLISHED) {
+ dev_err(disk_to_dev(nbd->disk), "Unsupported socket: not connected yet.\n");
+ *err = -ENOTCONN;
+ sockfd_put(sock);
+ return NULL;
+ }
+
if (sock->ops->shutdown == sock_no_shutdown) {
dev_err(disk_to_dev(nbd->disk), "Unsupported socket: shutdown callout must be supported.\n");
*err = -EINVAL;
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 5dd2bf24449e..c085c39573c9 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -38,6 +38,7 @@ void inet_splice_eof(struct socket *sock);
int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int flags);
int inet_shutdown(struct socket *sock, int how);
+int inet_shutdown_locked(struct socket *sock, int how);
int inet_listen(struct socket *sock, int backlog);
int __inet_listen_sk(struct sock *sk, int backlog);
void inet_sock_destruct(struct sock *sk);
diff --git a/include/net/sock.h b/include/net/sock.h
index 6c9a83016e95..203a60661fce 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1710,6 +1710,24 @@ static inline void lock_sock(struct sock *sk)
}
void __lock_sock(struct sock *sk);
+
+static inline bool lock_sock_try(struct sock *sk)
+{
+ if (!spin_trylock_bh(&sk->sk_lock.slock))
+ return false;
+
+ if (sk->sk_lock.owned) {
+ spin_unlock_bh(&sk->sk_lock.slock);
+ return false;
+ }
+
+ sk->sk_lock.owned = 1;
+ spin_unlock_bh(&sk->sk_lock.slock);
+
+ mutex_acquire(&sk->sk_lock.dep_map, 0, 1, _RET_IP_);
+ return true;
+}
+
void __release_sock(struct sock *sk);
void release_sock(struct sock *sk);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8036e76aa1e4..bde8ddcc28f0 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -896,21 +896,11 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
}
EXPORT_SYMBOL(inet_recvmsg);
-int inet_shutdown(struct socket *sock, int how)
+static int __inet_shutdown(struct socket *sock, int how)
{
struct sock *sk = sock->sk;
int err = 0;
- /* This should really check to make sure
- * the socket is a TCP socket. (WHY AC...)
- */
- how++; /* maps 0->1 has the advantage of making bit 1 rcvs and
- 1->2 bit 2 snds.
- 2->3 */
- if ((how & ~SHUTDOWN_MASK) || !how) /* MAXINT->0 */
- return -EINVAL;
-
- lock_sock(sk);
if (sock->state == SS_CONNECTING) {
if ((1 << sk->sk_state) &
(TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_CLOSE))
@@ -947,11 +937,45 @@ int inet_shutdown(struct socket *sock, int how)
/* Wake up anyone sleeping in poll. */
sk->sk_state_change(sk);
+
+ return err;
+}
+
+int inet_shutdown(struct socket *sock, int how)
+{
+ struct sock *sk = sock->sk;
+ int err;
+
+ /* maps 0->1 has the advantage of making bit 1 rcvs and
+ * 1->2 bit 2 snds.
+ * 2->3
+ */
+ how++;
+
+ if ((how & ~SHUTDOWN_MASK) || !how)
+ return -EINVAL;
+
+ lock_sock(sk);
+ err = __inet_shutdown(sock, how);
release_sock(sk);
+
return err;
}
EXPORT_SYMBOL(inet_shutdown);
+int inet_shutdown_locked(struct socket *sock, int how)
+{
+ sock_owned_by_me(sock->sk);
+
+ how++;
+
+ if ((how & ~SHUTDOWN_MASK) || !how)
+ return -EINVAL;
+
+ return __inet_shutdown(sock, how);
+}
+EXPORT_SYMBOL_GPL(inet_shutdown_locked);
+
/*
* ioctl() calls you can issue on an INET socket. Most of these are
* device configuration and stuff and very rarely used. Some ioctls
---8<---
next prev parent reply other threads:[~2026-03-23 6:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 11:12 [PATCH net v1] net/ipv6: mcast: fix circular locking dependency in __ipv6_dev_mc_inc() Jiayuan Chen
2026-03-19 1:15 ` Jakub Kicinski
2026-03-19 3:04 ` Jiayuan Chen
2026-03-19 3:26 ` Jakub Kicinski
2026-03-19 4:12 ` Jiayuan Chen
2026-03-19 12:44 ` Paolo Abeni
2026-03-19 15:36 ` Wouter Verhelst
2026-03-23 6:54 ` Kuniyuki Iwashima [this message]
2026-03-19 12:33 ` [net,v1] " Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260323065417.710353-1-kuniyu@google.com \
--to=kuniyu@google.com \
--cc=ap420073@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiayuan.chen@linux.dev \
--cc=jiayuan.chen@shopee.com \
--cc=josef@toxicpanda.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nbd@other.debian.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=syzbot+afbcf622635e98bf40d2@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox