public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Li Xiasong <lixiasong1@huawei.com>
Cc: geliang@kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	martineau@kernel.org, netdev@vger.kernel.org,
	mptcp@lists.linux.dev, linux-kernel@vger.kernel.org,
	weiyongjun1@huawei.com, yuehaibing@huawei.com,
	zhangchangzhong@huawei.com
Subject: Re: [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg()
Date: Mon, 23 Mar 2026 12:19:38 +0100	[thread overview]
Message-ID: <c23a41be-afb5-4b28-a26c-2e29c210aa98@kernel.org> (raw)
In-Reply-To: <e91bf909-bf4d-4f90-a370-688a9424478b@huawei.com>

Hi Li,

Sorry for the delay.

On 04/03/2026 10:24, Li Xiasong wrote:
> Hi Matt,
> 
> On 3/4/2026 2:06 AM, Matthieu Baerts wrote:
>> Hi Li,
>>
>> On 02/03/2026 06:26, Li Xiasong wrote:
>>> syzbot reported a soft lockup in mptcp_recvmsg() [0].
>>>
>>> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
>>> removed from the sk_receive_queue. This causes sk_wait_data() to always
>>> find available data and never perform actual waiting, leading to a soft
>>> lockup.
>>>
>>> Fix this by adding a 'last' parameter to track the last peeked skb.
>>> This allows sk_wait_data() to make informed waiting decisions and prevent
>>> infinite loops when MSG_PEEK is used.
>>
>> (...)
>>
>>> Fixes: 612f71d7328c ("mptcp: fix possible stall on recvmsg()")
>>> Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
>>> ---
>>>  net/mptcp/protocol.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index cf1852b99963..7a65c2101f63 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -2006,7 +2006,7 @@ static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
>>>  static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>>>  				size_t len, int flags, int copied_total,
>>>  				struct scm_timestamping_internal *tss,
>>> -				int *cmsg_flags)
>>> +				int *cmsg_flags, struct sk_buff **last)
>>>  {
>>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>>>  	struct sk_buff *skb, *tmp;
>>> @@ -2058,6 +2058,8 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
>>>  			}
>>>  
>>>  			mptcp_eat_recv_skb(sk, skb);
>>> +		} else {
>>> +			*last = skb;
>>
>> Out of curiosity, why only setting *last for MSG_PEEK? Is it not better
>> to always call sk_wait_data() later with the last skb, even when
>> MSG_PEEK is not used?
>>
>> Or will this cause other troubles?
> 
> 
> Yes, unconditionally updating last (like tcp_recvmsg_locked) makes
> sense. The current hesitation is due to mptcp_eat_recv_skb releasing the
> skb in non-MSG_PEEK cases—if the address is reused, keeping a last
> pointer could lead to misjudgment.

I think setting "last" just after having incremented "copied" would not
be confusing.

>>>  		}
>>>  
>>>  		if (copied >= len)
>>> @@ -2263,6 +2265,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>>  {
>>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>>>  	struct scm_timestamping_internal tss;
>>> +	struct sk_buff *last = NULL;
>>
>> Detail: the scope of this variable could eventually be reduced by moving
>> it inside the while-loop. This should hopefully help to reduce conflicts
>> during backports.
>>
> 
> 
> You're right. My initial thought was to move `last` into the while loop,
> but in practice, to retain the last MSG_PEEK skb, `last` must be updated
> very early in __mptcp_recvmsg_mskq as we begin traversing
> &sk->sk_receive_queue. The issue is that if a subsequent step fails—such
> as skb_copy_datagram_msg—we'd then need to roll `last` back to the
> previous skb, which adds significant complexity. This suggests the
> current approach may be the safer trade-off.

I think "last" should be initialised to the last item of the received
queue -- skb_peek_tail(&sk->sk_receive_queue) -- before walking it: that
seems simpler and cover errors in previous calls, no?

>>>  	int copied = 0, cmsg_flags = 0;
>>>  	int target;
>>>  	long timeo;
>>> @@ -2291,7 +2294,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>>  		int err, bytes_read;
>>>  
>>>  		bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags,
>>> -						  copied, &tss, &cmsg_flags);
>>> +						  copied, &tss, &cmsg_flags,
>>> +						  &last);
>>>  		if (unlikely(bytes_read < 0)) {
>>>  			if (!copied)
>>>  				copied = bytes_read;
>>> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>>  
>>>  		pr_debug("block timeout %ld\n", timeo);
>>>  		mptcp_cleanup_rbuf(msk, copied);
>>> -		err = sk_wait_data(sk, &timeo, NULL);
>>> +		err = sk_wait_data(sk, &timeo, last);
>>>  		if (err < 0) {
>>>  			err = copied ? : err;
>>>  			goto out_err;
>> Cheers,
>> Matt
> 
> 
> As requested, here are the two minimal test programs.

Thank you. These test programs couldn't be integrated in the test suite
because they required a manual step (check CPU usage). Instead, I wrote
a small packetdrill test:

  https://github.com/multipath-tcp/packetdrill/pull/192

There, you will also find a diff containing the modifications suggested
above. Do you mind sending a v2 with them if that's OK, please?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  reply	other threads:[~2026-03-23 11:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02  5:26 [PATCH net] mptcp: fix soft lockup in mptcp_recvmsg() Li Xiasong
2026-03-03 11:08 ` Matthieu Baerts
2026-03-03 18:06 ` Matthieu Baerts
2026-03-04  9:24   ` Li Xiasong
2026-03-23 11:19     ` Matthieu Baerts [this message]
2026-03-04  9:07 ` Paolo Abeni
2026-03-04 11:33   ` Li Xiasong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c23a41be-afb5-4b28-a26c-2e29c210aa98@kernel.org \
    --to=matttbe@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=geliang@kernel.org \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lixiasong1@huawei.com \
    --cc=martineau@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=weiyongjun1@huawei.com \
    --cc=yuehaibing@huawei.com \
    --cc=zhangchangzhong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox