Hi! On 5/6/26 03:41, Tung Quang Nguyen wrote: >> Subject: [PATCH net] tipc: avoid sending zero-length stream messages >> >> TIPC stream send currently enters the transmit loop even when the user >> payload length is zero. This can build and transmit a header-only connection >> message. >> >> For local TIPC sockets, such messages are delivered synchronously through the >> loopback receive path. When this happens while socket backlog processing is >> being flushed, reply transmission can re-enter TIPC receive processing >> repeatedly and trigger an RCU stall. >> > Can you demonstrate this scenario using code ? It is better to point out what current code is faulty. The minimized user-visible trigger is essentially: int fd[2]; struct msghdr msg = {}; socketpair(AF_TIPC, SOCK_STREAM, 0, fd); /* In parallel, this makes release_sock() flush backlog. */ setsockopt(fd[0], SOL_SOCKET, SO_ATTACH_BPF, &bad_fd, sizeof(bad_fd)); /* Repeated zero-length MSG_PROBE send on the connected peer. */ for (i = 0; i < 64; i++) sendmsg(fd[1], &msg, MSG_PROBE | MSG_MORE); The faulty current-code path is that TIPC stream send does not handle MSG_PROBE before entering __tipc_sendstream(). MSG_PROBE is supposed to probe without transmitting data, but the call reaches __tipc_sendstream() with dlen == 0. __tipc_sendstream() uses a do/while loop, so even when dlen is 0 the body runs once: send = min_t(size_t, dlen - sent, TIPC_MAX_USER_MSG_SIZE); At that point send is 0, but the code can still call tipc_msg_append() or tipc_msg_build(), creating a TIPC connection message with only the header. It then calls: tipc_node_xmit(net, txq, dnode, tsk->portid); For a local TIPC socketpair, tipc_node_xmit() takes the in_own_node() path and synchronously calls tipc_sk_rcv(). When this happens while release_sock() is processing backlog, the receive path can generate response traffic through tipc_node_distr_xmit(), which re-enters the same local receive path. I should have made that explicit in the changelog and pointed at the missing MSG_PROBE handling as the faulty part. >> >> diff --git a/net/tipc/socket.c b/net/tipc/socket.c index >> 9329919fb07f..3c7838713d74 100644 >> --- a/net/tipc/socket.c >> +++ b/net/tipc/socket.c >> @@ -1585,6 +1585,8 @@ static int __tipc_sendstream(struct socket *sock, >> struct msghdr *m, size_t dlen) >> tipc_sk_connected(sk))); >> if (unlikely(rc)) >> break; >> + if (unlikely(!dlen && sk->sk_type == SOCK_STREAM)) >> + break; > This change is wrong. It immediately breaks normal connection set up because the ACK (zero in length) has no chance to be sent back from the server to the client. > Please try to test your patch before submission. I did test the patch with the syzkaller C repro under QEMU for 10 minutes, and it did not trigger the reported RCU stall: /tmp/repro & pid=$!; sleep 600; kill $pid dmesg | grep -Ei 'rcu.*stall|rcu_preempt|soft lockup|panic|BUG|WARNING' (attached) The dmesg check did not show any repro-triggered RCU stall, soft lockup, panic, BUG, or WARNING. But that test only covered the syzkaller trigger; it did not cover normal active/passive TIPC stream connection setup, which your review points out is broken by this version. I re-checked the TIPC connection setup path as well. tipc_accept() intentionally sends the server-side ACK as a zero-length stream message: iov_iter_kvec(&m.msg_iter, ITER_SOURCE, NULL, 0, 0); __tipc_sendstream(new_sock, &m, 0); So blocking all zero-length sends inside __tipc_sendstream() prevents that ACK from being transmitted and can break normal SOCK_STREAM connection setup. After re-checking the syzkaller repro, the real trigger seems to be narrower than zero-length stream send. The repro uses a user sendmsg() with MSG_PROBE | MSG_MORE and no payload on an already connected TIPC stream socket. MSG_PROBE is supposed to probe without sending, but TIPC stream send currently lets that path reach __tipc_sendstream(), where the do/while body can still run once with dlen == 0 and build/transmit a header-only message. I think we should avoid suppressing the internal __tipc_sendstream() ACK path and instead handle the user-originated zero-length MSG_PROBE case before it reaches the internal stream send helper. The v2 fix would look like this: -- 8< -- diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 9329919fb07f..4783df337971 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1542,6 +1542,10 @@ static int tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dsz) struct sock *sk = sock->sk; int ret; + /* MSG_PROBE asks only to probe the path, not to transmit data. */ + if (unlikely((m->msg_flags & MSG_PROBE) && !dsz)) + return 0; + lock_sock(sk); ret = __tipc_sendstream(sock, m, dsz); release_sock(sk); -- >8 -- I tested the reworked patch with the syzkaller C reproducer under QEMU. The reproducer was run for 10 minutes: /tmp/repro & pid=$!; sleep 600; kill $pid dmesg | grep -Ei 'rcu.*stall|rcu_preempt|soft lockup|panic|BUG|WARNING' (attached) The grep only matched boot-time command-line/debug messages; no repro-triggered RCU stall, soft lockup, panic, BUG, or WARNING appeared. What you think?