public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
@ 2026-03-02  5:26 Li Xiasong
  2026-03-03 11:08 ` Matthieu Baerts
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Li Xiasong @ 2026-03-02  5:26 UTC (permalink / raw)
  To: matttbe, martineau, geliang, davem, edumazet, kuba, pabeni, horms
  Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing,
	zhangchangzhong

syzbot reported a soft lockup in mptcp_recvmsg() [0].

When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
removed from the sk_receive_queue. This causes sk_wait_data() to always
find available data and never perform actual waiting, leading to a soft
lockup.

Fix this by adding a 'last' parameter to track the last peeked skb.
This allows sk_wait_data() to make informed waiting decisions and prevent
infinite loops when MSG_PEEK is used.

[0]:
watchdog: BUG: soft lockup - CPU#2 stuck for 156s! [server:1963]
Modules linked in:
CPU: 2 UID: 0 PID: 1963 Comm: server Not tainted 6.19.0-rc8 #61 PREEMPT(none)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:sk_wait_data+0x15/0x190
Code: 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 56 41 55 41 54 49 89 f4 55 48 89 d5 53 48 89 fb <48> 83 ec 30 65 48 8b 05 17 a4 6b 01 48 89 44 24 28 31 c0 65 48 8b
RSP: 0018:ffffc90000603ca0 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff888102bf0800 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffc90000603d18 RDI: ffff888102bf0800
RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000101
R10: 0000000000000000 R11: 0000000000000075 R12: ffffc90000603d18
R13: ffff888102bf0800 R14: ffff888102bf0800 R15: 0000000000000000
FS:  00007f6e38b8c4c0(0000) GS:ffff8881b877e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055aa7bff1680 CR3: 0000000105cbe000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 mptcp_recvmsg+0x547/0x8c0 net/mptcp/protocol.c:2329
 inet_recvmsg+0x11f/0x130 net/ipv4/af_inet.c:891
 sock_recvmsg+0x94/0xc0 net/socket.c:1100
 __sys_recvfrom+0xb2/0x130 net/socket.c:2256
 __x64_sys_recvfrom+0x1f/0x30 net/socket.c:2267
 do_syscall_64+0x59/0x2d0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e arch/x86/entry/entry_64.S:131
RIP: 0033:0x7f6e386a4a1d
Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 f1 de 2c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
RSP: 002b:00007ffc3c4bb078 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
RAX: ffffffffffffffda RBX: 000000000000861e RCX: 00007f6e386a4a1d
RDX: 00000000000003ff RSI: 00007ffc3c4bb150 RDI: 0000000000000004
RBP: 00007ffc3c4bb570 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000103 R11: 0000000000000246 R12: 00005605dbc00be0
R13: 00007ffc3c4bb650 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()")
Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
---
 net/mptcp/protocol.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index cf1852b99963..7a65c2101f63 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
 static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
 				size_t len, int flags, int copied_total,
 				struct scm_timestamping_internal *tss,
-				int *cmsg_flags)
+				int *cmsg_flags, struct sk_buff **last)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct sk_buff *skb, *tmp;
@@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
 			}
 
 			mptcp_eat_recv_skb(sk, skb);
+		} else {
+			*last = skb;
 		}
 
 		if (copied >= len)
@@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct scm_timestamping_internal tss;
+	struct sk_buff *last = NULL;
 	int copied = 0, cmsg_flags = 0;
 	int target;
 	long timeo;
@@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		int err, bytes_read;
 
 		bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags,
-						  copied, &tss, &cmsg_flags);
+						  copied, &tss, &cmsg_flags,
+						  &last);
 		if (unlikely(bytes_read < 0)) {
 			if (!copied)
 				copied = bytes_read;
@@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 
 		pr_debug("block timeout %ld\n", timeo);
 		mptcp_cleanup_rbuf(msk, copied);
-		err = sk_wait_data(sk, &timeo, NULL);
+		err = sk_wait_data(sk, &timeo, last);
 		if (err < 0) {
 			err = copied ? : err;
 			goto out_err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
  2026-03-02  5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong
@ 2026-03-03 11:08 ` Matthieu Baerts
  2026-03-03 18:06 ` Matthieu Baerts
  2026-03-04  9:07 ` Paolo Abeni
  2 siblings, 0 replies; 7+ messages in thread
From: Matthieu Baerts @ 2026-03-03 11:08 UTC (permalink / raw)
  To: Li Xiasong, martineau, geliang, davem, edumazet, kuba, pabeni,
	horms
  Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing,
	zhangchangzhong

Hi Li,

On 02/03/2026 06:26, Li Xiasong wrote:
> syzbot reported a soft lockup in mptcp_recvmsg() [0].
> 
> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
> removed from the sk_receive_queue. This causes sk_wait_data() to always
> find available data and never perform actual waiting, leading to a soft
> lockup.
> 
> Fix this by adding a 'last' parameter to track the last peeked skb.
> This allows sk_wait_data() to make informed waiting decisions and prevent
> infinite loops when MSG_PEEK is used.

Thank you for the patch! I will check why sk_wait_data() was not used
with the last SKB before.

By chance, do you have a reproducer that you could eventually share please?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
  2026-03-02  5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong
  2026-03-03 11:08 ` Matthieu Baerts
@ 2026-03-03 18:06 ` Matthieu Baerts
  2026-03-04  9:24   ` Li Xiasong
  2026-03-04  9:07 ` Paolo Abeni
  2 siblings, 1 reply; 7+ messages in thread
From: Matthieu Baerts @ 2026-03-03 18:06 UTC (permalink / raw)
  To: Li Xiasong, martineau, geliang, davem, edumazet, kuba, pabeni,
	horms
  Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing,
	zhangchangzhong

Hi Li,

On 02/03/2026 06:26, Li Xiasong wrote:
> syzbot reported a soft lockup in mptcp_recvmsg() [0].
> 
> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
> removed from the sk_receive_queue. This causes sk_wait_data() to always
> find available data and never perform actual waiting, leading to a soft
> lockup.
> 
> Fix this by adding a 'last' parameter to track the last peeked skb.
> This allows sk_wait_data() to make informed waiting decisions and prevent
> infinite loops when MSG_PEEK is used.

(...)

> Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()")
> Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
> ---
>  net/mptcp/protocol.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index cf1852b99963..7a65c2101f63 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
>  static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>  				size_t len, int flags, int copied_total,
>  				struct scm_timestamping_internal *tss,
> -				int *cmsg_flags)
> +				int *cmsg_flags, struct sk_buff **last)
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
>  	struct sk_buff *skb, *tmp;
> @@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>  			}
>  
>  			mptcp_eat_recv_skb(sk, skb);
> +		} else {
> +			*last = skb;

Out of curiosity, why only setting *last for MSG_PEEK? Is it not better
to always call sk_wait_data() later with the last skb, even when
MSG_PEEK is not used?

Or will this cause other troubles?

>  		}
>  
>  		if (copied >= len)
> @@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
>  	struct scm_timestamping_internal tss;
> +	struct sk_buff *last = NULL;

Detail: the scope of this variable could eventually be reduced by moving
it inside the while-loop. This should hopefully help to reduce conflicts
during backports.

>  	int copied = 0, cmsg_flags = 0;
>  	int target;
>  	long timeo;
> @@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  		int err, bytes_read;
>  
>  		bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags,
> -						  copied, &tss, &cmsg_flags);
> +						  copied, &tss, &cmsg_flags,
> +						  &last);
>  		if (unlikely(bytes_read < 0)) {
>  			if (!copied)
>  				copied = bytes_read;
> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  
>  		pr_debug("block timeout %ld\n", timeo);
>  		mptcp_cleanup_rbuf(msk, copied);
> -		err = sk_wait_data(sk, &timeo, NULL);
> +		err = sk_wait_data(sk, &timeo, last);
>  		if (err < 0) {
>  			err = copied ? : err;
>  			goto out_err;
Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
  2026-03-02  5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong
  2026-03-03 11:08 ` Matthieu Baerts
  2026-03-03 18:06 ` Matthieu Baerts
@ 2026-03-04  9:07 ` Paolo Abeni
  2026-03-04 11:33   ` Li Xiasong
  2 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2026-03-04  9:07 UTC (permalink / raw)
  To: Li Xiasong, matttbe, martineau, geliang, davem, edumazet, kuba,
	horms
  Cc: netdev, mptcp, linux-kernel, weiyongjun1, yuehaibing,
	zhangchangzhong

On 3/2/26 6:26 AM, Li Xiasong wrote:
> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  
>  		pr_debug("block timeout %ld\n", timeo);
>  		mptcp_cleanup_rbuf(msk, copied);
> -		err = sk_wait_data(sk, &timeo, NULL);
> +		err = sk_wait_data(sk, &timeo, last);

Isn't
		err = sk_wait_data(sk, &timeo,  skb_peek_tail(&sk->sk_receive_queue));

Enough?

Also, could you please add a paired tests-case? Thanks!

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
  2026-03-03 18:06 ` Matthieu Baerts
@ 2026-03-04  9:24   ` Li Xiasong
  2026-03-23 11:19     ` Matthieu Baerts
  0 siblings, 1 reply; 7+ messages in thread
From: Li Xiasong @ 2026-03-04  9:24 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: geliang, davem, edumazet, kuba, pabeni, horms, martineau, netdev,
	mptcp, linux-kernel, weiyongjun1, yuehaibing, zhangchangzhong

Hi Matt,

On 3/4/2026 2:06 AM, Matthieu Baerts wrote:
> Hi Li,
> 
> On 02/03/2026 06:26, Li Xiasong wrote:
>> syzbot reported a soft lockup in mptcp_recvmsg() [0].
>>
>> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
>> removed from the sk_receive_queue. This causes sk_wait_data() to always
>> find available data and never perform actual waiting, leading to a soft
>> lockup.
>>
>> Fix this by adding a 'last' parameter to track the last peeked skb.
>> This allows sk_wait_data() to make informed waiting decisions and prevent
>> infinite loops when MSG_PEEK is used.
> 
> (...)
> 
>> Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()")
>> Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
>> ---
>>  net/mptcp/protocol.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index cf1852b99963..7a65c2101f63 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
>>  static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>>  				size_t len, int flags, int copied_total,
>>  				struct scm_timestamping_internal *tss,
>> -				int *cmsg_flags)
>> +				int *cmsg_flags, struct sk_buff **last)
>>  {
>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>>  	struct sk_buff *skb, *tmp;
>> @@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>>  			}
>>  
>>  			mptcp_eat_recv_skb(sk, skb);
>> +		} else {
>> +			*last = skb;
> 
> Out of curiosity, why only setting *last for MSG_PEEK? Is it not better
> to always call sk_wait_data() later with the last skb, even when
> MSG_PEEK is not used?
> 
> Or will this cause other troubles?


Yes, unconditionally updating last (like tcp_recvmsg_locked) makes
sense. The current hesitation is due to mptcp_eat_recv_skb releasing the
skb in non-MSG_PEEK cases—if the address is reused, keeping a last
pointer could lead to misjudgment.

> 
>>  		}
>>  
>>  		if (copied >= len)
>> @@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>  {
>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>>  	struct scm_timestamping_internal tss;
>> +	struct sk_buff *last = NULL;
> 
> Detail: the scope of this variable could eventually be reduced by moving
> it inside the while-loop. This should hopefully help to reduce conflicts
> during backports.
> 


You're right. My initial thought was to move `last` into the while loop,
but in practice, to retain the last MSG_PEEK skb, `last` must be updated
very early in __mptcp_recvmsg_mskq as we begin traversing
&sk->sk_receive_queue. The issue is that if a subsequent step fails—such
as skb_copy_datagram_msg—we'd then need to roll `last` back to the
previous skb, which adds significant complexity. This suggests the
current approach may be the safer trade-off.

>>  	int copied = 0, cmsg_flags = 0;
>>  	int target;
>>  	long timeo;
>> @@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>  		int err, bytes_read;
>>  
>>  		bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags,
>> -						  copied, &tss, &cmsg_flags);
>> +						  copied, &tss, &cmsg_flags,
>> +						  &last);
>>  		if (unlikely(bytes_read < 0)) {
>>  			if (!copied)
>>  				copied = bytes_read;
>> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>  
>>  		pr_debug("block timeout %ld\n", timeo);
>>  		mptcp_cleanup_rbuf(msk, copied);
>> -		err = sk_wait_data(sk, &timeo, NULL);
>> +		err = sk_wait_data(sk, &timeo, last);
>>  		if (err < 0) {
>>  			err = copied ? : err;
>>  			goto out_err;
> Cheers,
> Matt


As requested, here are the two minimal test programs.

Steps to reproduce
1.Save as client.c and server.c.

2.Compile:
    gcc client.c -o client
    gcc server.c -o server

3.Run ./server, then ./client in another terminal.

--- client.c ---

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <time.h>
#include <errno.h>

#ifndef IPPROTO_MPTCP
#define IPPROTO_MPTCP 262
#endif

#define PORT 8888

int main() {
    int sock_fd;
    struct sockaddr_in server_addr;
    int enable = 1;

    if ((sock_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)) < 0) {
        perror("Socket creation failed");
        exit(EXIT_FAILURE);
    }
    printf("Created MPTCP socket\n");

    if (setsockopt(sock_fd, SOL_SOCKET, SO_REUSEADDR, &enable,
sizeof(enable)) < 0) {
        perror("Setsockopt SO_REUSEADDR failed");
        close(sock_fd);
        exit(EXIT_FAILURE);
    }

    memset(&server_addr, 0, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(PORT);
    server_addr.sin_addr.s_addr = inet_addr("127.0.0.1");

    printf("Connecting to 127.0.0.1:%d...\n", PORT);
    if (connect(sock_fd, (struct sockaddr*)&server_addr,
sizeof(server_addr)) < 0) {
        perror("Connection failed");
        close(sock_fd);
        exit(EXIT_FAILURE);
    }
    printf("Connected successfully\n");

    sleep(3);
    printf("\nStarting data transmission test...\n");

    send(sock_fd, "A", 1, 0);

    while (1) {
        sleep(1); // wait
    }

    close(sock_fd);

    printf("Bye!\n");
    return 0;
}

--- client.c ends ---

--- server.c ---

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <errno.h>
#include <signal.h>

#define PORT 8888
#define BUFFER_SIZE 1024
#define BACKLOG 5

#ifndef IPPROTO_MPTCP
#define IPPROTO_MPTCP 262
#endif

int sock_fd = -1;
int client_fd = -1;

void handle_signal(int sig) {
    printf("\nSignal %d received, cleaning up...\n", sig);
    if (client_fd > 0) close(client_fd);
    if (sock_fd > 0) close(sock_fd);
    exit(0);
}

int main() {
    struct sockaddr_in server_addr, client_addr;
    socklen_t client_len = sizeof(client_addr);
    char buffer[BUFFER_SIZE];
    int enable = 1;
    long total_received = 0;
    long long start_time = 0;

    signal(SIGINT, handle_signal);
    signal(SIGTERM, handle_signal);

    if ((sock_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)) < 0) {
        perror("Socket creation failed");
        exit(EXIT_FAILURE);
    }
    printf("Created MPTCP socket\n");

    if (setsockopt(sock_fd, SOL_SOCKET, SO_REUSEADDR, &enable,
sizeof(enable)) < 0) {
        perror("Setsockopt SO_REUSEADDR failed");
        close(sock_fd);
        exit(EXIT_FAILURE);
    }

    memset(&server_addr, 0, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(PORT);
    server_addr.sin_addr.s_addr = INADDR_ANY;

    if (bind(sock_fd, (struct sockaddr*)&server_addr,
sizeof(server_addr)) < 0) {
        perror("Bind failed");
        close(sock_fd);
        exit(EXIT_FAILURE);
    }
    printf("Bound to port %d\n", PORT);

    if (listen(sock_fd, BACKLOG) < 0) {
        perror("Listen failed");
        close(sock_fd);
        exit(EXIT_FAILURE);
    }
    printf("Listening for MPTCP connections...\n");

    client_fd = accept(sock_fd, (struct sockaddr*)&client_addr,
&client_len);
    if (client_fd < 0) {
        perror("Accept failed");
        close(sock_fd);
        exit(EXIT_FAILURE);
    }
    printf("Client connected from %s:%d\n",
           inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));

    printf("\nWaiting for data...\n");

    while (1) {
        int bytes_received = recv(client_fd, buffer, BUFFER_SIZE - 1,
MSG_PEEK | MSG_WAITALL);
        if (bytes_received <= 0) {
            // Just break out
            break;
        }
        printf("Receive bytes size %d\n", bytes_received);
    }

    close(client_fd);
    close(sock_fd);
    return 0;
}

--- server.c ends ---

Best regards,
Li Xiasong





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
  2026-03-04  9:07 ` Paolo Abeni
@ 2026-03-04 11:33   ` Li Xiasong
  0 siblings, 0 replies; 7+ messages in thread
From: Li Xiasong @ 2026-03-04 11:33 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: davem, edumazet, geliang, horms, kuba, linux-kernel, martineau,
	Matthieu Baerts, mptcp, netdev, weiyongjun1, yuehaibing,
	zhangchangzhong

Hi Paolo,

On 3/4/2026 5:07 PM, Paolo Abeni wrote:
> On 3/2/26 6:26 AM, Li Xiasong wrote:
>> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>  
>>  		pr_debug("block timeout %ld\n", timeo);
>>  		mptcp_cleanup_rbuf(msk, copied);
>> -		err = sk_wait_data(sk, &timeo, NULL);
>> +		err = sk_wait_data(sk, &timeo, last);
> 
> Isn't
> 		err = sk_wait_data(sk, &timeo,  skb_peek_tail(&sk->sk_receive_queue));
> 
> Enough?

Thanks for the review! In the general case, letting sk_wait_data block is
correct. The exception is within __mptcp_recvmsg_mskq, where a failure in
skb_copy_datagram_msg will cause a break in the consumption loop. This
leaves the actual processing position inconsistent with
skb_peek_tail(&sk->sk_receive_queue), and at that point, we must not wait.

> 
> Also, could you please add a paired tests-case? Thanks!
> 

Further to our discussion, a minimal reproduction case for this scenario is
available here:
https://lore.kernel.org/netdev/e91bf909-bf4d-4f90-a370-688a9424478b@huawei.com/

When running the reproduced program, you can observe the server's sy
(system) CPU usage hitting 100%.

Thanks,
Li Xiasong


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
  2026-03-04  9:24   ` Li Xiasong
@ 2026-03-23 11:19     ` Matthieu Baerts
  0 siblings, 0 replies; 7+ messages in thread
From: Matthieu Baerts @ 2026-03-23 11:19 UTC (permalink / raw)
  To: Li Xiasong
  Cc: geliang, davem, edumazet, kuba, pabeni, horms, martineau, netdev,
	mptcp, linux-kernel, weiyongjun1, yuehaibing, zhangchangzhong

Hi Li,

Sorry for the delay.

On 04/03/2026 10:24, Li Xiasong wrote:
> Hi Matt,
> 
> On 3/4/2026 2:06 AM, Matthieu Baerts wrote:
>> Hi Li,
>>
>> On 02/03/2026 06:26, Li Xiasong wrote:
>>> syzbot reported a soft lockup in mptcp_recvmsg() [0].
>>>
>>> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
>>> removed from the sk_receive_queue. This causes sk_wait_data() to always
>>> find available data and never perform actual waiting, leading to a soft
>>> lockup.
>>>
>>> Fix this by adding a 'last' parameter to track the last peeked skb.
>>> This allows sk_wait_data() to make informed waiting decisions and prevent
>>> infinite loops when MSG_PEEK is used.
>>
>> (...)
>>
>>> Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()")
>>> Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
>>> ---
>>>  net/mptcp/protocol.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index cf1852b99963..7a65c2101f63 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
>>>  static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>>>  				size_t len, int flags, int copied_total,
>>>  				struct scm_timestamping_internal *tss,
>>> -				int *cmsg_flags)
>>> +				int *cmsg_flags, struct sk_buff **last)
>>>  {
>>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>>>  	struct sk_buff *skb, *tmp;
>>> @@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>>>  			}
>>>  
>>>  			mptcp_eat_recv_skb(sk, skb);
>>> +		} else {
>>> +			*last = skb;
>>
>> Out of curiosity, why only setting *last for MSG_PEEK? Is it not better
>> to always call sk_wait_data() later with the last skb, even when
>> MSG_PEEK is not used?
>>
>> Or will this cause other troubles?
> 
> 
> Yes, unconditionally updating last (like tcp_recvmsg_locked) makes
> sense. The current hesitation is due to mptcp_eat_recv_skb releasing the
> skb in non-MSG_PEEK cases—if the address is reused, keeping a last
> pointer could lead to misjudgment.

I think setting "last" just after having incremented "copied" would not
be confusing.

>>>  		}
>>>  
>>>  		if (copied >= len)
>>> @@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>>  {
>>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>>>  	struct scm_timestamping_internal tss;
>>> +	struct sk_buff *last = NULL;
>>
>> Detail: the scope of this variable could eventually be reduced by moving
>> it inside the while-loop. This should hopefully help to reduce conflicts
>> during backports.
>>
> 
> 
> You're right. My initial thought was to move `last` into the while loop,
> but in practice, to retain the last MSG_PEEK skb, `last` must be updated
> very early in __mptcp_recvmsg_mskq as we begin traversing
> &sk->sk_receive_queue. The issue is that if a subsequent step fails—such
> as skb_copy_datagram_msg—we'd then need to roll `last` back to the
> previous skb, which adds significant complexity. This suggests the
> current approach may be the safer trade-off.

I think "last" should be initialised to the last item of the received
queue -- skb_peek_tail(&sk->sk_receive_queue) -- before walking it: that
seems simpler and cover errors in previous calls, no?

>>>  	int copied = 0, cmsg_flags = 0;
>>>  	int target;
>>>  	long timeo;
>>> @@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>>  		int err, bytes_read;
>>>  
>>>  		bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags,
>>> -						  copied, &tss, &cmsg_flags);
>>> +						  copied, &tss, &cmsg_flags,
>>> +						  &last);
>>>  		if (unlikely(bytes_read < 0)) {
>>>  			if (!copied)
>>>  				copied = bytes_read;
>>> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>>  
>>>  		pr_debug("block timeout %ld\n", timeo);
>>>  		mptcp_cleanup_rbuf(msk, copied);
>>> -		err = sk_wait_data(sk, &timeo, NULL);
>>> +		err = sk_wait_data(sk, &timeo, last);
>>>  		if (err < 0) {
>>>  			err = copied ? : err;
>>>  			goto out_err;
>> Cheers,
>> Matt
> 
> 
> As requested, here are the two minimal test programs.

Thank you. These test programs couldn't be integrated in the test suite
because they required a manual step (check CPU usage). Instead, I wrote
a small packetdrill test:

  https://github.com/multipath-tcp/packetdrill/pull/192

There, you will also find a diff containing the modifications suggested
above. Do you mind sending a v2 with them if that's OK, please?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-23 11:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02  5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong
2026-03-03 11:08 ` Matthieu Baerts
2026-03-03 18:06 ` Matthieu Baerts
2026-03-04  9:24   ` Li Xiasong
2026-03-23 11:19     ` Matthieu Baerts
2026-03-04  9:07 ` Paolo Abeni
2026-03-04 11:33   ` Li Xiasong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox