From: Matthieu Baerts <matttbe@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>, mptcp@lists.linux.dev
Subject: Re: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
Date: Thu, 13 Nov 2025 10:00:54 +0100 [thread overview]
Message-ID: <5253a2f3-99b0-4ac9-8d17-74c86bea2ef8@kernel.org> (raw)
In-Reply-To: <ee40999f77d0186abba2a6b21958c1a8c4881c57.1762992570.git.pabeni@redhat.com>
Hi Paolo,
On 13/11/2025 01:10, Paolo Abeni wrote:
> If a subflow receives data before gaining the memcg while the msk
> socket lock is held at accept time, or the PM locks the msk socket
> while still unaccepted and subflows push data to it at the same time,
> the mptcp_graph_subflows() can complete with a non empty backlog.
>
> The msk will try to borrow such memory, but (some) of the skbs there
> where not memcg charged. When the msk finally will return such accounted
> memory, we should hit the same splat of #597.
> [even if so far I was unable to replicate this scenario]
>
> This patch tries to address such potential issue by:
> - explicitly keep track of the amount of memory added to the backlog
> not CG accounted
> - additionally accouting for such memory at accept time
> - preventing any subflow from adding memory to the backlog not CG
> accounted after the above flush
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 64 +++++++++++++++++++++++++++++++++++++++++---
> net/mptcp/protocol.h | 1 +
> 2 files changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index addd8025d235..abf0edc4b888 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -658,6 +658,7 @@ static void __mptcp_add_backlog(struct sock *sk,
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
> struct sk_buff *tail = NULL;
> + struct sock *ssk = skb->sk;
> bool fragstolen;
> int delta;
>
> @@ -671,18 +672,26 @@ static void __mptcp_add_backlog(struct sock *sk,
> tail = list_last_entry(&msk->backlog_list, struct sk_buff, list);
>
> if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq &&
> - skb->sk == tail->sk &&
> + ssk == tail->sk &&
> __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) {
> skb->truesize -= delta;
> kfree_skb_partial(skb, fragstolen);
> __mptcp_subflow_lend_fwdmem(subflow, delta);
> - WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
> - return;
> + goto account;
> }
>
> list_add_tail(&skb->list, &msk->backlog_list);
> mptcp_subflow_lend_fwdmem(subflow, skb);
> - WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize);
> + delta = skb->truesize;
> +
> +account:
> + WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
> +
> + /* Possibly not accept()ed yet, keep track of memory not CG
> + * accounted, mptcp_grapt_subflows will handle it.
detail: s/grapt/graft
> + */
> + if (!ssk->sk_memcg)
> + msk->backlog_unaccounted += delta;
> }
>
> static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> @@ -2154,6 +2163,12 @@ static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skbs)
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
>
> + /* After CG initialization, subflows should never add skb before
> + * gaining the CG themself.
> + */
> + DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket &&
> + mem_cgroup_from_sk(sk));
> +
> /* Don't spool the backlog if the rcvbuf is full. */
> if (list_empty(&msk->backlog_list) ||
> sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> @@ -4059,6 +4074,22 @@ static void mptcp_graft_subflows(struct sock *sk)
> struct mptcp_subflow_context *subflow;
> struct mptcp_sock *msk = mptcp_sk(sk);
>
> + if (mem_cgroup_sockets_enabled) {
> + LIST_HEAD(join_list);
> +
> + /* Subflows joining after __inet_accept() with get the
> + * mem CG properly initialized at mptcp_finish_join() time,
> + * but subflows pending in join_list need explicit
> + * initialization before flushing `backlog_unaccounted`
> + * or we can cat unexpeced unaccounted memory later.
detail: s/cat/catch/ or s/cat/get/?
> + */
> + mptcp_data_lock(sk);
> + list_splice_init(&msk->join_list, &join_list);
> + mptcp_data_unlock(sk);
> +
> + __mptcp_flush_join_list(sk, &join_list);
> + }
> +
> mptcp_for_each_subflow(msk, subflow) {
> struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>
> @@ -4070,10 +4101,35 @@ static void mptcp_graft_subflows(struct sock *sk)
> if (!ssk->sk_socket)
> mptcp_sock_graft(ssk, sk->sk_socket);
>
> + if (!mem_cgroup_sk_enabled(sk))
detail: what's the cost of this call? (static key + get)
Do we need to use a local variable to call this helper only once?
> + goto unlock;
> +
> __mptcp_inherit_cgrp_data(sk, ssk);
> __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
> +
> +unlock:
> release_sock(ssk);
> }
> +
> + if (mem_cgroup_sk_enabled(sk)) {
> + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
> + int amt;
> +
> + /* Account the backlog memory; prior accept() is aware of
> + * fwd and rmem only
> + */
> + mptcp_data_lock(sk);
> + amt = sk_mem_pages(sk->sk_forward_alloc +
> + msk->backlog_unaccounted +
> + atomic_read(&sk->sk_rmem_alloc)) -
> + sk_mem_pages(sk->sk_forward_alloc +
> + atomic_read(&sk->sk_rmem_alloc));
> + msk->backlog_unaccounted = 0;
> + mptcp_data_unlock(sk);
> +
> + if (amt)
> + mem_cgroup_sk_charge(sk, amt, gfp);
> + }
> }
>
> static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 161b704be16b..199f28f3dd5e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -360,6 +360,7 @@ struct mptcp_sock {
>
> struct list_head backlog_list; /* protected by the data lock */
> u32 backlog_len;
> + u32 backlog_unaccounted;
> };
>
> #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
next prev parent reply other threads:[~2025-11-13 9:00 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case Paolo Abeni
2025-11-13 8:47 ` Matthieu Baerts
2025-11-13 17:09 ` Paolo Abeni
2025-11-13 17:14 ` Matthieu Baerts
2025-11-13 0:10 ` [PATCH v3 mptcp-net 2/3] Squash-to: "mptcp: fix memcg accounting for passive sockets" Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
2025-11-13 9:00 ` Matthieu Baerts [this message]
2025-11-13 11:30 ` kernel test robot
2025-11-13 17:16 ` Matthieu Baerts
2025-11-13 11:52 ` kernel test robot
2025-11-13 0:35 ` [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up MPTCP CI
2025-11-13 1:21 ` MPTCP CI
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5253a2f3-99b0-4ac9-8d17-74c86bea2ef8@kernel.org \
--to=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox