Re: [PATCH v5 mptcp-next 10/10] mptcp: leverage the backlog for RX packet processing

MPTCP Linux Development
 help / color / mirror / Atom feed

From: Mat Martineau <martineau@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH v5 mptcp-next 10/10] mptcp: leverage the backlog for RX packet processing
Date: Tue, 21 Oct 2025 16:53:33 -0700 (PDT)	[thread overview]
Message-ID: <0c2291ba-6876-e67b-1607-a56203164ec6@kernel.org> (raw)
In-Reply-To: <69af34f2-5a75-4a02-a183-9609cad4abae@redhat.com>

On Tue, 21 Oct 2025, Paolo Abeni wrote:

> On 10/21/25 1:32 AM, Mat Martineau wrote:
>> On Mon, 6 Oct 2025, Paolo Abeni wrote:
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 2d5d3da67d1ac..a97a92eccc502 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>
>> ...
>>
>>> @@ -3509,23 +3519,29 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
>>>
>>> #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \
>>> 				      BIT(MPTCP_RETRANSMIT) | \
>>> -				      BIT(MPTCP_FLUSH_JOIN_LIST) | \
>>> -				      BIT(MPTCP_DEQUEUE))
>>> +				      BIT(MPTCP_FLUSH_JOIN_LIST))
>>>
>>> /* processes deferred events and flush wmem */
>>> static void mptcp_release_cb(struct sock *sk)
>>> 	__must_hold(&sk->sk_lock.slock)
>>> {
>>> 	struct mptcp_sock *msk = mptcp_sk(sk);
>>> +	u32 delta = 0;
>>>
>>> 	for (;;) {
>>> 		unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED);
>>> -		struct list_head join_list;
>>> +		LIST_HEAD(join_list);
>>> +		LIST_HEAD(skbs);
>>> +
>>> +		sk_forward_alloc_add(sk, msk->borrowed_mem);
>>> +		msk->borrowed_mem = 0;
>>> +
>>> +		if (sk_rmem_alloc_get(sk) < sk->sk_rcvbuf)
>>> +			list_splice_init(&msk->backlog_list, &skbs);
>>>
>>> -		if (!flags)
>>> +		if (!flags && list_empty(&skbs))
>>> 			break;
>>>
>>> -		INIT_LIST_HEAD(&join_list);
>>> 		list_splice_init(&msk->join_list, &join_list);
>>>
>>> 		/* the following actions acquire the subflow socket lock
>>> @@ -3544,7 +3560,8 @@ static void mptcp_release_cb(struct sock *sk)
>>> 			__mptcp_push_pending(sk, 0);
>>> 		if (flags & BIT(MPTCP_RETRANSMIT))
>>> 			__mptcp_retrans(sk);
>>> -		if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) {
>>> +		if (!list_empty(&skbs) &&
>>> +		    __mptcp_move_skbs(sk, &skbs, &delta)) {
>>> 			/* notify ack seq update */
>>> 			mptcp_cleanup_rbuf(msk, 0);
>>> 			sk->sk_data_ready(sk);
>>> @@ -3552,7 +3569,9 @@ static void mptcp_release_cb(struct sock *sk)
>>>
>>> 		cond_resched();
>>> 		spin_lock_bh(&sk->sk_lock.slock);
>>> +		list_splice(&skbs, &msk->backlog_list);
>>> 	}
>>> +	WRITE_ONCE(msk->backlog_len, msk->backlog_len - delta);
>>
>> Hi Paolo -
>>
>> Given the possible multiple calls to __mptcp_move_skbs() and that the
>> spinlock is released/reacquired (and the cond_resched) in the middle,
>> would it make sense to update msk->backlog_len for each iteration of the
>> loop so __mptcp_space() and mptcp_space() don't under-report available
>> space and mptcp_cleanup_rbuf() can make incremental progress?
>>
>> I know we don't want to WRITE_ONCE() more than necessary, but it seems
>> like there won't typically be more than one loop iteration. In the cases
>> where it does repeat the loop that means data is arriving quickly and
>> reporting mptcp_space accurately will be important.
>
> That WRITE_ONCE() is intentionally out of the loop, but not as an
> optimization, but for a functional goal, similar to:
>
> https://elixir.bootlin.com/linux/v6.17.4/source/net/core/sock.c#L3190
>
> without it, in exceptional situation, the loop could run for an
> unbounded amount of time.
>
> Given this is MPTCP-level, and packets went already through the TCP
> subflow possibly such scenario is even more unlikely, but I think it's
> still possible and serious enough we want to avoid it.
>

Ah, I think I see where that could happen if the received packets get 
discarded and don't get counted against the rcv buffer limits.  "Double 
counting" the most recently moved packets using backlog_len creates the 
temporary appearance of no buffer space in that "wild producer" scenario. 
Still applies at the MPTCP level, I agree.

> WRT the receive buffer utilization, it should not change much, as either
> on the backlog or in the receiver buffer, the skb should be accounted
> for the whole truesize.

My concern is mostly that the moved skbs are accounted twice until the 
loop is finally exited, so in the non-adversarial case where packets are 
accepted there would be an impact on the behavior of the rx window during 
high speed transfers.

If the overall delta was used for the purpose of exiting the loop, the 
backlog_len could be updated on each loop iteration while still keeping 
the loop bounded.

Taking into consideration the expected case of high-throughput behavior 
vs. an unexpected/infrequent "wild producer" scenario, does it still seem 
best to keep the above code as-is? I'll see what you decide in v6 :)

- Mat

next prev parent reply	other threads:[~2025-10-21 23:53 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06  8:11 [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 01/10] mptcp: borrow forward memory from subflow Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 02/10] mptcp: cleanup fallback data fin reception Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 03/10] mptcp: cleanup fallback dummy mapping generation Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 04/10] mptcp: fix MSG_PEEK stream corruption Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 05/10] mptcp: ensure the kernel PM does not take action too late Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 06/10] mptcp: do not miss early first subflow close event notification Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 07/10] mptcp: make mptcp_destroy_common() static Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 08/10] mptcp: drop the __mptcp_data_ready() helper Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 09/10] mptcp: introduce mptcp-level backlog Paolo Abeni
2025-10-08  3:09   ` Geliang Tang
2025-10-20 19:45   ` Mat Martineau
2025-10-06  8:12 ` [PATCH v5 mptcp-next 10/10] mptcp: leverage the backlog for RX packet processing Paolo Abeni
2025-10-20 23:32   ` Mat Martineau
2025-10-21 17:21     ` Paolo Abeni
2025-10-21 23:53       ` Mat Martineau [this message]
2025-10-06 17:07 ` [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing Matthieu Baerts
2025-10-08  3:07   ` Geliang Tang
2025-10-08  7:30     ` Paolo Abeni
2025-10-09  6:54       ` Geliang Tang
2025-10-09  7:52         ` Paolo Abeni
2025-10-09  9:02           ` Geliang Tang
2025-10-09 10:23             ` Paolo Abeni
2025-10-09 13:58               ` Paolo Abeni
2025-10-10  8:21                 ` Paolo Abeni
2025-10-10 12:22                   ` Geliang Tang
2025-10-13  9:07                     ` Geliang Tang
2025-10-13 13:29                       ` Paolo Abeni
2025-10-13 17:07                         ` Paolo Abeni
2025-10-15  9:00                       ` Paolo Abeni
2025-10-17  6:38                         ` Geliang Tang
2025-10-18  0:16                           ` Mat Martineau
2025-10-06 17:43 ` MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c2291ba-6876-e67b-1607-a56203164ec6@kernel.org \
    --to=martineau@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox