* [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
@ 2026-03-02 5:26 Li Xiasong
2026-03-03 11:08 ` Matthieu Baerts
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Li Xiasong @ 2026-03-02 5:26 UTC (permalink / raw)
To: matttbe, martineau, geliang, davem, edumazet, kuba, pabeni, horms
Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing,
zhangchangzhong
syzbot reported a soft lockup in mptcp_recvmsg() [0].
When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
removed from the sk_receive_queue. This causes sk_wait_data() to always
find available data and never perform actual waiting, leading to a soft
lockup.
Fix this by adding a 'last' parameter to track the last peeked skb.
This allows sk_wait_data() to make informed waiting decisions and prevent
infinite loops when MSG_PEEK is used.
[0]:
watchdog: BUG: soft lockup - CPU#2 stuck for 156s! [server:1963]
Modules linked in:
CPU: 2 UID: 0 PID: 1963 Comm: server Not tainted 6.19.0-rc8 #61 PREEMPT(none)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:sk_wait_data+0x15/0x190
Code: 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 56 41 55 41 54 49 89 f4 55 48 89 d5 53 48 89 fb <48> 83 ec 30 65 48 8b 05 17 a4 6b 01 48 89 44 24 28 31 c0 65 48 8b
RSP: 0018:ffffc90000603ca0 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff888102bf0800 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffc90000603d18 RDI: ffff888102bf0800
RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000101
R10: 0000000000000000 R11: 0000000000000075 R12: ffffc90000603d18
R13: ffff888102bf0800 R14: ffff888102bf0800 R15: 0000000000000000
FS: 00007f6e38b8c4c0(0000) GS:ffff8881b877e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055aa7bff1680 CR3: 0000000105cbe000 CR4: 00000000000006f0
Call Trace:
<TASK>
mptcp_recvmsg+0x547/0x8c0 net/mptcp/protocol.c:2329
inet_recvmsg+0x11f/0x130 net/ipv4/af_inet.c:891
sock_recvmsg+0x94/0xc0 net/socket.c:1100
__sys_recvfrom+0xb2/0x130 net/socket.c:2256
__x64_sys_recvfrom+0x1f/0x30 net/socket.c:2267
do_syscall_64+0x59/0x2d0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e arch/x86/entry/entry_64.S:131
RIP: 0033:0x7f6e386a4a1d
Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 f1 de 2c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
RSP: 002b:00007ffc3c4bb078 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
RAX: ffffffffffffffda RBX: 000000000000861e RCX: 00007f6e386a4a1d
RDX: 00000000000003ff RSI: 00007ffc3c4bb150 RDI: 0000000000000004
RBP: 00007ffc3c4bb570 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000103 R11: 0000000000000246 R12: 00005605dbc00be0
R13: 00007ffc3c4bb650 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()")
Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
---
net/mptcp/protocol.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index cf1852b99963..7a65c2101f63 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
size_t len, int flags, int copied_total,
struct scm_timestamping_internal *tss,
- int *cmsg_flags)
+ int *cmsg_flags, struct sk_buff **last)
{
struct mptcp_sock *msk = mptcp_sk(sk);
struct sk_buff *skb, *tmp;
@@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
}
mptcp_eat_recv_skb(sk, skb);
+ } else {
+ *last = skb;
}
if (copied >= len)
@@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
{
struct mptcp_sock *msk = mptcp_sk(sk);
struct scm_timestamping_internal tss;
+ struct sk_buff *last = NULL;
int copied = 0, cmsg_flags = 0;
int target;
long timeo;
@@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int err, bytes_read;
bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags,
- copied, &tss, &cmsg_flags);
+ copied, &tss, &cmsg_flags,
+ &last);
if (unlikely(bytes_read < 0)) {
if (!copied)
copied = bytes_read;
@@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
pr_debug("block timeout %ld\n", timeo);
mptcp_cleanup_rbuf(msk, copied);
- err = sk_wait_data(sk, &timeo, NULL);
+ err = sk_wait_data(sk, &timeo, last);
if (err < 0) {
err = copied ? : err;
goto out_err;
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() 2026-03-02 5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong @ 2026-03-03 11:08 ` Matthieu Baerts 2026-03-03 18:06 ` Matthieu Baerts 2026-03-04 9:07 ` Paolo Abeni 2 siblings, 0 replies; 7+ messages in thread From: Matthieu Baerts @ 2026-03-03 11:08 UTC (permalink / raw) To: Li Xiasong, martineau, geliang, davem, edumazet, kuba, pabeni, horms Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing, zhangchangzhong Hi Li, On 02/03/2026 06:26, Li Xiasong wrote: > syzbot reported a soft lockup in mptcp_recvmsg() [0]. > > When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not > removed from the sk_receive_queue. This causes sk_wait_data() to always > find available data and never perform actual waiting, leading to a soft > lockup. > > Fix this by adding a 'last' parameter to track the last peeked skb. > This allows sk_wait_data() to make informed waiting decisions and prevent > infinite loops when MSG_PEEK is used. Thank you for the patch! I will check why sk_wait_data() was not used with the last SKB before. By chance, do you have a reproducer that you could eventually share please? Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() 2026-03-02 5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong 2026-03-03 11:08 ` Matthieu Baerts @ 2026-03-03 18:06 ` Matthieu Baerts 2026-03-04 9:24 ` Li Xiasong 2026-03-04 9:07 ` Paolo Abeni 2 siblings, 1 reply; 7+ messages in thread From: Matthieu Baerts @ 2026-03-03 18:06 UTC (permalink / raw) To: Li Xiasong, martineau, geliang, davem, edumazet, kuba, pabeni, horms Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing, zhangchangzhong Hi Li, On 02/03/2026 06:26, Li Xiasong wrote: > syzbot reported a soft lockup in mptcp_recvmsg() [0]. > > When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not > removed from the sk_receive_queue. This causes sk_wait_data() to always > find available data and never perform actual waiting, leading to a soft > lockup. > > Fix this by adding a 'last' parameter to track the last peeked skb. > This allows sk_wait_data() to make informed waiting decisions and prevent > infinite loops when MSG_PEEK is used. (...) > Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()") > Signed-off-by: Li Xiasong <lixiasong1@huawei.com> > --- > net/mptcp/protocol.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index cf1852b99963..7a65c2101f63 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb) > static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, > size_t len, int flags, int copied_total, > struct scm_timestamping_internal *tss, > - int *cmsg_flags) > + int *cmsg_flags, struct sk_buff **last) > { > struct mptcp_sock *msk = mptcp_sk(sk); > struct sk_buff *skb, *tmp; > @@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, > } > > mptcp_eat_recv_skb(sk, skb); > + } else { > + *last = skb; Out of curiosity, why only setting *last for MSG_PEEK? Is it not better to always call sk_wait_data() later with the last skb, even when MSG_PEEK is not used? Or will this cause other troubles? > } > > if (copied >= len) > @@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, > { > struct mptcp_sock *msk = mptcp_sk(sk); > struct scm_timestamping_internal tss; > + struct sk_buff *last = NULL; Detail: the scope of this variable could eventually be reduced by moving it inside the while-loop. This should hopefully help to reduce conflicts during backports. > int copied = 0, cmsg_flags = 0; > int target; > long timeo; > @@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, > int err, bytes_read; > > bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, > - copied, &tss, &cmsg_flags); > + copied, &tss, &cmsg_flags, > + &last); > if (unlikely(bytes_read < 0)) { > if (!copied) > copied = bytes_read; > @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, > > pr_debug("block timeout %ld\n", timeo); > mptcp_cleanup_rbuf(msk, copied); > - err = sk_wait_data(sk, &timeo, NULL); > + err = sk_wait_data(sk, &timeo, last); > if (err < 0) { > err = copied ? : err; > goto out_err; Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() 2026-03-03 18:06 ` Matthieu Baerts @ 2026-03-04 9:24 ` Li Xiasong 2026-03-23 11:19 ` Matthieu Baerts 0 siblings, 1 reply; 7+ messages in thread From: Li Xiasong @ 2026-03-04 9:24 UTC (permalink / raw) To: Matthieu Baerts Cc: geliang, davem, edumazet, kuba, pabeni, horms, martineau, netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing, zhangchangzhong Hi Matt, On 3/4/2026 2:06 AM, Matthieu Baerts wrote: > Hi Li, > > On 02/03/2026 06:26, Li Xiasong wrote: >> syzbot reported a soft lockup in mptcp_recvmsg() [0]. >> >> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not >> removed from the sk_receive_queue. This causes sk_wait_data() to always >> find available data and never perform actual waiting, leading to a soft >> lockup. >> >> Fix this by adding a 'last' parameter to track the last peeked skb. >> This allows sk_wait_data() to make informed waiting decisions and prevent >> infinite loops when MSG_PEEK is used. > > (...) > >> Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()") >> Signed-off-by: Li Xiasong <lixiasong1@huawei.com> >> --- >> net/mptcp/protocol.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >> index cf1852b99963..7a65c2101f63 100644 >> --- a/net/mptcp/protocol.c >> +++ b/net/mptcp/protocol.c >> @@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb) >> static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, >> size_t len, int flags, int copied_total, >> struct scm_timestamping_internal *tss, >> - int *cmsg_flags) >> + int *cmsg_flags, struct sk_buff **last) >> { >> struct mptcp_sock *msk = mptcp_sk(sk); >> struct sk_buff *skb, *tmp; >> @@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, >> } >> >> mptcp_eat_recv_skb(sk, skb); >> + } else { >> + *last = skb; > > Out of curiosity, why only setting *last for MSG_PEEK? Is it not better > to always call sk_wait_data() later with the last skb, even when > MSG_PEEK is not used? > > Or will this cause other troubles? Yes, unconditionally updating last (like tcp_recvmsg_locked) makes sense. The current hesitation is due to mptcp_eat_recv_skb releasing the skb in non-MSG_PEEK cases—if the address is reused, keeping a last pointer could lead to misjudgment. > >> } >> >> if (copied >= len) >> @@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, >> { >> struct mptcp_sock *msk = mptcp_sk(sk); >> struct scm_timestamping_internal tss; >> + struct sk_buff *last = NULL; > > Detail: the scope of this variable could eventually be reduced by moving > it inside the while-loop. This should hopefully help to reduce conflicts > during backports. > You're right. My initial thought was to move `last` into the while loop, but in practice, to retain the last MSG_PEEK skb, `last` must be updated very early in __mptcp_recvmsg_mskq as we begin traversing &sk->sk_receive_queue. The issue is that if a subsequent step fails—such as skb_copy_datagram_msg—we'd then need to roll `last` back to the previous skb, which adds significant complexity. This suggests the current approach may be the safer trade-off. >> int copied = 0, cmsg_flags = 0; >> int target; >> long timeo; >> @@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, >> int err, bytes_read; >> >> bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, >> - copied, &tss, &cmsg_flags); >> + copied, &tss, &cmsg_flags, >> + &last); >> if (unlikely(bytes_read < 0)) { >> if (!copied) >> copied = bytes_read; >> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, >> >> pr_debug("block timeout %ld\n", timeo); >> mptcp_cleanup_rbuf(msk, copied); >> - err = sk_wait_data(sk, &timeo, NULL); >> + err = sk_wait_data(sk, &timeo, last); >> if (err < 0) { >> err = copied ? : err; >> goto out_err; > Cheers, > Matt As requested, here are the two minimal test programs. Steps to reproduce 1.Save as client.c and server.c. 2.Compile: gcc client.c -o client gcc server.c -o server 3.Run ./server, then ./client in another terminal. --- client.c --- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <arpa/inet.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <time.h> #include <errno.h> #ifndef IPPROTO_MPTCP #define IPPROTO_MPTCP 262 #endif #define PORT 8888 int main() { int sock_fd; struct sockaddr_in server_addr; int enable = 1; if ((sock_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)) < 0) { perror("Socket creation failed"); exit(EXIT_FAILURE); } printf("Created MPTCP socket\n"); if (setsockopt(sock_fd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(enable)) < 0) { perror("Setsockopt SO_REUSEADDR failed"); close(sock_fd); exit(EXIT_FAILURE); } memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_port = htons(PORT); server_addr.sin_addr.s_addr = inet_addr("127.0.0.1"); printf("Connecting to 127.0.0.1:%d...\n", PORT); if (connect(sock_fd, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) { perror("Connection failed"); close(sock_fd); exit(EXIT_FAILURE); } printf("Connected successfully\n"); sleep(3); printf("\nStarting data transmission test...\n"); send(sock_fd, "A", 1, 0); while (1) { sleep(1); // wait } close(sock_fd); printf("Bye!\n"); return 0; } --- client.c ends --- --- server.c --- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <arpa/inet.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <errno.h> #include <signal.h> #define PORT 8888 #define BUFFER_SIZE 1024 #define BACKLOG 5 #ifndef IPPROTO_MPTCP #define IPPROTO_MPTCP 262 #endif int sock_fd = -1; int client_fd = -1; void handle_signal(int sig) { printf("\nSignal %d received, cleaning up...\n", sig); if (client_fd > 0) close(client_fd); if (sock_fd > 0) close(sock_fd); exit(0); } int main() { struct sockaddr_in server_addr, client_addr; socklen_t client_len = sizeof(client_addr); char buffer[BUFFER_SIZE]; int enable = 1; long total_received = 0; long long start_time = 0; signal(SIGINT, handle_signal); signal(SIGTERM, handle_signal); if ((sock_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)) < 0) { perror("Socket creation failed"); exit(EXIT_FAILURE); } printf("Created MPTCP socket\n"); if (setsockopt(sock_fd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(enable)) < 0) { perror("Setsockopt SO_REUSEADDR failed"); close(sock_fd); exit(EXIT_FAILURE); } memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_port = htons(PORT); server_addr.sin_addr.s_addr = INADDR_ANY; if (bind(sock_fd, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) { perror("Bind failed"); close(sock_fd); exit(EXIT_FAILURE); } printf("Bound to port %d\n", PORT); if (listen(sock_fd, BACKLOG) < 0) { perror("Listen failed"); close(sock_fd); exit(EXIT_FAILURE); } printf("Listening for MPTCP connections...\n"); client_fd = accept(sock_fd, (struct sockaddr*)&client_addr, &client_len); if (client_fd < 0) { perror("Accept failed"); close(sock_fd); exit(EXIT_FAILURE); } printf("Client connected from %s:%d\n", inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port)); printf("\nWaiting for data...\n"); while (1) { int bytes_received = recv(client_fd, buffer, BUFFER_SIZE - 1, MSG_PEEK | MSG_WAITALL); if (bytes_received <= 0) { // Just break out break; } printf("Receive bytes size %d\n", bytes_received); } close(client_fd); close(sock_fd); return 0; } --- server.c ends --- Best regards, Li Xiasong ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() 2026-03-04 9:24 ` Li Xiasong @ 2026-03-23 11:19 ` Matthieu Baerts 0 siblings, 0 replies; 7+ messages in thread From: Matthieu Baerts @ 2026-03-23 11:19 UTC (permalink / raw) To: Li Xiasong Cc: geliang, davem, edumazet, kuba, pabeni, horms, martineau, netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing, zhangchangzhong Hi Li, Sorry for the delay. On 04/03/2026 10:24, Li Xiasong wrote: > Hi Matt, > > On 3/4/2026 2:06 AM, Matthieu Baerts wrote: >> Hi Li, >> >> On 02/03/2026 06:26, Li Xiasong wrote: >>> syzbot reported a soft lockup in mptcp_recvmsg() [0]. >>> >>> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not >>> removed from the sk_receive_queue. This causes sk_wait_data() to always >>> find available data and never perform actual waiting, leading to a soft >>> lockup. >>> >>> Fix this by adding a 'last' parameter to track the last peeked skb. >>> This allows sk_wait_data() to make informed waiting decisions and prevent >>> infinite loops when MSG_PEEK is used. >> >> (...) >> >>> Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()") >>> Signed-off-by: Li Xiasong <lixiasong1@huawei.com> >>> --- >>> net/mptcp/protocol.c | 10 +++++++--- >>> 1 file changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>> index cf1852b99963..7a65c2101f63 100644 >>> --- a/net/mptcp/protocol.c >>> +++ b/net/mptcp/protocol.c >>> @@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb) >>> static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, >>> size_t len, int flags, int copied_total, >>> struct scm_timestamping_internal *tss, >>> - int *cmsg_flags) >>> + int *cmsg_flags, struct sk_buff **last) >>> { >>> struct mptcp_sock *msk = mptcp_sk(sk); >>> struct sk_buff *skb, *tmp; >>> @@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, >>> } >>> >>> mptcp_eat_recv_skb(sk, skb); >>> + } else { >>> + *last = skb; >> >> Out of curiosity, why only setting *last for MSG_PEEK? Is it not better >> to always call sk_wait_data() later with the last skb, even when >> MSG_PEEK is not used? >> >> Or will this cause other troubles? > > > Yes, unconditionally updating last (like tcp_recvmsg_locked) makes > sense. The current hesitation is due to mptcp_eat_recv_skb releasing the > skb in non-MSG_PEEK cases—if the address is reused, keeping a last > pointer could lead to misjudgment. I think setting "last" just after having incremented "copied" would not be confusing. >>> } >>> >>> if (copied >= len) >>> @@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, >>> { >>> struct mptcp_sock *msk = mptcp_sk(sk); >>> struct scm_timestamping_internal tss; >>> + struct sk_buff *last = NULL; >> >> Detail: the scope of this variable could eventually be reduced by moving >> it inside the while-loop. This should hopefully help to reduce conflicts >> during backports. >> > > > You're right. My initial thought was to move `last` into the while loop, > but in practice, to retain the last MSG_PEEK skb, `last` must be updated > very early in __mptcp_recvmsg_mskq as we begin traversing > &sk->sk_receive_queue. The issue is that if a subsequent step fails—such > as skb_copy_datagram_msg—we'd then need to roll `last` back to the > previous skb, which adds significant complexity. This suggests the > current approach may be the safer trade-off. I think "last" should be initialised to the last item of the received queue -- skb_peek_tail(&sk->sk_receive_queue) -- before walking it: that seems simpler and cover errors in previous calls, no? >>> int copied = 0, cmsg_flags = 0; >>> int target; >>> long timeo; >>> @@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, >>> int err, bytes_read; >>> >>> bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, >>> - copied, &tss, &cmsg_flags); >>> + copied, &tss, &cmsg_flags, >>> + &last); >>> if (unlikely(bytes_read < 0)) { >>> if (!copied) >>> copied = bytes_read; >>> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, >>> >>> pr_debug("block timeout %ld\n", timeo); >>> mptcp_cleanup_rbuf(msk, copied); >>> - err = sk_wait_data(sk, &timeo, NULL); >>> + err = sk_wait_data(sk, &timeo, last); >>> if (err < 0) { >>> err = copied ? : err; >>> goto out_err; >> Cheers, >> Matt > > > As requested, here are the two minimal test programs. Thank you. These test programs couldn't be integrated in the test suite because they required a manual step (check CPU usage). Instead, I wrote a small packetdrill test: https://github.com/multipath-tcp/packetdrill/pull/192 There, you will also find a diff containing the modifications suggested above. Do you mind sending a v2 with them if that's OK, please? Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() 2026-03-02 5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong 2026-03-03 11:08 ` Matthieu Baerts 2026-03-03 18:06 ` Matthieu Baerts @ 2026-03-04 9:07 ` Paolo Abeni 2026-03-04 11:33 ` Li Xiasong 2 siblings, 1 reply; 7+ messages in thread From: Paolo Abeni @ 2026-03-04 9:07 UTC (permalink / raw) To: Li Xiasong, matttbe, martineau, geliang, davem, edumazet, kuba, horms Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing, zhangchangzhong On 3/2/26 6:26 AM, Li Xiasong wrote: > @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, > > pr_debug("block timeout %ld\n", timeo); > mptcp_cleanup_rbuf(msk, copied); > - err = sk_wait_data(sk, &timeo, NULL); > + err = sk_wait_data(sk, &timeo, last); Isn't err = sk_wait_data(sk, &timeo, skb_peek_tail(&sk->sk_receive_queue)); Enough? Also, could you please add a paired tests-case? Thanks! Paolo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() 2026-03-04 9:07 ` Paolo Abeni @ 2026-03-04 11:33 ` Li Xiasong 0 siblings, 0 replies; 7+ messages in thread From: Li Xiasong @ 2026-03-04 11:33 UTC (permalink / raw) To: Paolo Abeni Cc: davem, edumazet, geliang, horms, kuba, linux-kernel, martineau, Matthieu Baerts, mptcp, netdev, weiyongjun1, yuehaibing, zhangchangzhong Hi Paolo, On 3/4/2026 5:07 PM, Paolo Abeni wrote: > On 3/2/26 6:26 AM, Li Xiasong wrote: >> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, >> >> pr_debug("block timeout %ld\n", timeo); >> mptcp_cleanup_rbuf(msk, copied); >> - err = sk_wait_data(sk, &timeo, NULL); >> + err = sk_wait_data(sk, &timeo, last); > > Isn't > err = sk_wait_data(sk, &timeo, skb_peek_tail(&sk->sk_receive_queue)); > > Enough? Thanks for the review! In the general case, letting sk_wait_data block is correct. The exception is within __mptcp_recvmsg_mskq, where a failure in skb_copy_datagram_msg will cause a break in the consumption loop. This leaves the actual processing position inconsistent with skb_peek_tail(&sk->sk_receive_queue), and at that point, we must not wait. > > Also, could you please add a paired tests-case? Thanks! > Further to our discussion, a minimal reproduction case for this scenario is available here: https://lore.kernel.org/netdev/e91bf909-bf4d-4f90-a370-688a9424478b@huawei.com/ When running the reproduced program, you can observe the server's sy (system) CPU usage hitting 100%. Thanks, Li Xiasong ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-23 11:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-02 5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong 2026-03-03 11:08 ` Matthieu Baerts 2026-03-03 18:06 ` Matthieu Baerts 2026-03-04 9:24 ` Li Xiasong 2026-03-23 11:19 ` Matthieu Baerts 2026-03-04 9:07 ` Paolo Abeni 2026-03-04 11:33 ` Li Xiasong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox