From: Paolo Abeni <pabeni@redhat.com>
To: Matthieu Baerts <matttbe@kernel.org>,
mptcp@lists.linux.dev, Geliang Tang <geliang@kernel.org>
Subject: Re: [PATCH v2 mptcp-next] Squash-to: "mptcp: leverage the backlog for RX packet processing"
Date: Wed, 12 Nov 2025 10:24:46 +0100 [thread overview]
Message-ID: <27e7304b-d485-4e01-bd47-a169c5b25849@redhat.com> (raw)
In-Reply-To: <704b4358-30b2-4065-abbb-752f4f9a3c79@kernel.org>
On 11/11/25 5:21 PM, Matthieu Baerts wrote:
> On 09/11/2025 14:53, Paolo Abeni wrote:
>> If a subflow receives data before gaining the memcg while the msk
>> socket lock is held at accept time, or the PM locks the msk socket
>> while still unaccepted and subflows push data to it at the same time,
>> the mptcp_graph_subflows() can complete with a non empty backlog.
>>
>> The msk will try to borrow such memory, but (some) of the skbs there
>> where not memcg charged. When the msk finally will return such accounted
>> memory, we should hit the same splat of #597.
>> [even if so far I was unable to replicate this scenario]
>>
>> This patch tries to address such potential issue by:
>> - preventing the subflow from queuing data into the backlog after
>> gaining the memcg. This ensure that at the end of the look all the
>> skbs in the backlog (if any) are _not_ memory accounted.
>> - mem charge the backlog to msk
>> - 'restart' the subflow and spool any data waiting there.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> net/mptcp/protocol.c | 46 ++++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 5e9325c7ea9c..d6b08e1de358 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -4082,10 +4082,12 @@ static void mptcp_graph_subflows(struct sock *sk)
>> {
>> struct mptcp_subflow_context *subflow;
>> struct mptcp_sock *msk = mptcp_sk(sk);
>> + struct sock *ssk;
>> + int old_amt, amt;
>> + bool slow;
>>
>> mptcp_for_each_subflow(msk, subflow) {
>> - struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>> - bool slow;
>> + ssk = mptcp_subflow_tcp_sock(subflow);
>>
>> slow = lock_sock_fast(ssk);
>>
>> @@ -4095,8 +4097,48 @@ static void mptcp_graph_subflows(struct sock *sk)
>> if (!ssk->sk_socket)
>> mptcp_sock_graft(ssk, sk->sk_socket);
>>
>> + if (!mem_cgroup_from_sk(sk))
>
> Should we not call mem_cgroup_sk_enabled() instead? It does this:
>
> return mem_cgroup_sockets_enabled && mem_cgroup_from_sk(sk);
>
> That's what is done in net/core/sock.c and net/ipv4/tcp_output.c. Not in
> __inet_accept(), because mem_cgroup_sockets_enabled() is checked before.
> Maybe we should do the same here?
>
> (Note that it is not clear to me if mem_cgroup can be enabled later on,
> and if yes, what should be done with existing connections.)
It's just an additional optimization, to leverage static branch, but
it's not strictly needed. Can be added, thus.
>> + goto unlock;
>> +
>> __mptcp_inherit_cgrp_data(sk, ssk);
>> __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
>> +
>> + /* Prevent subflows from queueing data into the backlog
>> + * as soon as cg is set; note that we can't race
>> + * with __mptcp_close_ssk setting this bit for a really
>> + * closing socket, because we hold the msk socket lock here.
>> + */
>> + subflow->closing = 1;
>> +
>> +unlock:
>> + unlock_sock_fast(ssk, slow);
>> + }
>> +
>> + if (!mem_cgroup_from_sk(sk))
>
> Same here?
>
>> + return;
>> +
>> + /* Charge the bl memory, note that __sk_charge accounted for
>> + * fwd memory and rmem only
>> + */
>> + mptcp_data_lock(sk);
>> + old_amt = sk_mem_pages(sk->sk_forward_alloc +
>> + atomic_read(&sk->sk_rmem_alloc));
>> + amt = sk_mem_pages(msk->backlog_len + sk->sk_forward_alloc +
>> + atomic_read(&sk->sk_rmem_alloc));
>
> (Same as Geliang for the alignment here, and eventually calling
> kmem_cache_charge() like in __inet_accept())
This and the next are the more obscure point. I chose to not call
kmem_cache_charge() because I'm (was) a bit doubtful about such call
being legit in __inet_accept(): active (plain TCP) sockets are not
accounted, just passive ones. Re-thinking about it I guess it's better
to be consistent with TCP than trying to be smarted (history has proved
it does not work so well :-P)
TL;DR: I'll add the missing kmem_cache_charge();
>> + amt -= old_amt;
>> + if (amt)
>> + mem_cgroup_sk_charge(sk, amt, GFP_ATOMIC | __GFP_NOFAIL);
>
> Just to be sure: no need to check if there was an error? It is not done
> in __inet_accept() either, so I guess no?
The __GFP_NOFAIL flag ensures that the call can not fail. Adding it with
GFP_ATOMIC is at least "original" (this is the only call site with this
flags combo). In the next version I'll move the call outside the
spinlock (we are still under the msk socket lock) to replace GFP_ATOMIC
with GFP_KERNEL.
Many thanks for all the review effort!
/P
next prev parent reply other threads:[~2025-11-12 9:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 13:53 [PATCH v2 mptcp-next] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
2025-11-09 22:01 ` MPTCP CI
2025-11-11 7:21 ` Geliang Tang
2025-11-11 17:14 ` Matthieu Baerts
2025-11-11 16:21 ` Matthieu Baerts
2025-11-11 17:09 ` Matthieu Baerts
2025-11-12 9:24 ` Paolo Abeni [this message]
2025-11-12 9:52 ` Matthieu Baerts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=27e7304b-d485-4e01-bd47-a169c5b25849@redhat.com \
--to=pabeni@redhat.com \
--cc=geliang@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox