MPTCP Linux Development
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>, mptcp@lists.linux.dev
Subject: Re: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
Date: Thu, 13 Nov 2025 10:00:54 +0100	[thread overview]
Message-ID: <5253a2f3-99b0-4ac9-8d17-74c86bea2ef8@kernel.org> (raw)
In-Reply-To: <ee40999f77d0186abba2a6b21958c1a8c4881c57.1762992570.git.pabeni@redhat.com>

Hi Paolo,

On 13/11/2025 01:10, Paolo Abeni wrote:
> If a subflow receives data before gaining the memcg while the msk
> socket lock is held at accept time, or the PM locks the msk socket
> while still unaccepted and subflows push data to it at the same time,
> the mptcp_graph_subflows() can complete with a non empty backlog.
> 
> The msk will try to borrow such memory, but (some) of the skbs there
> where not memcg charged. When the msk finally will return such accounted
> memory, we should hit the same splat of #597.
> [even if so far I was unable to replicate this scenario]
> 
> This patch tries to address such potential issue by:
> - explicitly keep track of the amount of memory added to the backlog
>   not CG accounted
> - additionally accouting for such memory at accept time
> - preventing any subflow from adding memory to the backlog not CG
>   accounted after the above flush
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/mptcp/protocol.c | 64 +++++++++++++++++++++++++++++++++++++++++---
>  net/mptcp/protocol.h |  1 +
>  2 files changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index addd8025d235..abf0edc4b888 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -658,6 +658,7 @@ static void __mptcp_add_backlog(struct sock *sk,
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
>  	struct sk_buff *tail = NULL;
> +	struct sock *ssk = skb->sk;
>  	bool fragstolen;
>  	int delta;
>  
> @@ -671,18 +672,26 @@ static void __mptcp_add_backlog(struct sock *sk,
>  		tail = list_last_entry(&msk->backlog_list, struct sk_buff, list);
>  
>  	if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq &&
> -	    skb->sk == tail->sk &&
> +	    ssk == tail->sk &&
>  	    __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) {
>  		skb->truesize -= delta;
>  		kfree_skb_partial(skb, fragstolen);
>  		__mptcp_subflow_lend_fwdmem(subflow, delta);
> -		WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
> -		return;
> +		goto account;
>  	}
>  
>  	list_add_tail(&skb->list, &msk->backlog_list);
>  	mptcp_subflow_lend_fwdmem(subflow, skb);
> -	WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize);
> +	delta = skb->truesize;
> +
> +account:
> +	WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
> +
> +	/* Possibly not accept()ed yet, keep track of memory not CG
> +	 * accounted, mptcp_grapt_subflows will handle it.

detail: s/grapt/graft

> +	 */
> +	if (!ssk->sk_memcg)
> +		msk->backlog_unaccounted += delta;
>  }
>  
>  static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> @@ -2154,6 +2163,12 @@ static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skbs)
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
>  
> +	/* After CG initialization, subflows should never add skb before
> +	 * gaining the CG themself.
> +	 */
> +	DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket &&
> +			       mem_cgroup_from_sk(sk));
> +
>  	/* Don't spool the backlog if the rcvbuf is full. */
>  	if (list_empty(&msk->backlog_list) ||
>  	    sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> @@ -4059,6 +4074,22 @@ static void mptcp_graft_subflows(struct sock *sk)
>  	struct mptcp_subflow_context *subflow;
>  	struct mptcp_sock *msk = mptcp_sk(sk);
>  
> +	if (mem_cgroup_sockets_enabled) {
> +		LIST_HEAD(join_list);
> +
> +		/* Subflows joining after __inet_accept() with get the
> +		 * mem CG properly initialized at mptcp_finish_join() time,
> +		 * but subflows pending in join_list need explicit
> +		 * initialization before flushing `backlog_unaccounted`
> +		 * or we can cat unexpeced unaccounted memory later.

detail: s/cat/catch/ or s/cat/get/?

> +		 */
> +		mptcp_data_lock(sk);
> +		list_splice_init(&msk->join_list, &join_list);
> +		mptcp_data_unlock(sk);
> +
> +		__mptcp_flush_join_list(sk, &join_list);
> +	}
> +
>  	mptcp_for_each_subflow(msk, subflow) {
>  		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>  
> @@ -4070,10 +4101,35 @@ static void mptcp_graft_subflows(struct sock *sk)
>  		if (!ssk->sk_socket)
>  			mptcp_sock_graft(ssk, sk->sk_socket);
>  
> +		if (!mem_cgroup_sk_enabled(sk))

detail: what's the cost of this call? (static key + get)
Do we need to use a local variable to call this helper only once?

> +			goto unlock;
> +
>  		__mptcp_inherit_cgrp_data(sk, ssk);
>  		__mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
> +
> +unlock:
>  		release_sock(ssk);
>  	}
> +
> +	if (mem_cgroup_sk_enabled(sk)) {
> +		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
> +		int amt;
> +
> +		/* Account the backlog memory; prior accept() is aware of
> +		 * fwd and rmem only
> +		 */
> +		mptcp_data_lock(sk);
> +		amt = sk_mem_pages(sk->sk_forward_alloc +
> +				   msk->backlog_unaccounted +
> +				   atomic_read(&sk->sk_rmem_alloc)) -
> +		      sk_mem_pages(sk->sk_forward_alloc +
> +				   atomic_read(&sk->sk_rmem_alloc));
> +		msk->backlog_unaccounted = 0;
> +		mptcp_data_unlock(sk);
> +
> +		if (amt)
> +			mem_cgroup_sk_charge(sk, amt, gfp);
> +	}
>  }
>  
>  static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 161b704be16b..199f28f3dd5e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -360,6 +360,7 @@ struct mptcp_sock {
>  
>  	struct list_head backlog_list;	/* protected by the data lock */
>  	u32		backlog_len;
> +	u32		backlog_unaccounted;
>  };
>  
>  #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  reply	other threads:[~2025-11-13  9:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13  0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
2025-11-13  0:10 ` [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case Paolo Abeni
2025-11-13  8:47   ` Matthieu Baerts
2025-11-13 17:09     ` Paolo Abeni
2025-11-13 17:14       ` Matthieu Baerts
2025-11-13  0:10 ` [PATCH v3 mptcp-net 2/3] Squash-to: "mptcp: fix memcg accounting for passive sockets" Paolo Abeni
2025-11-13  0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
2025-11-13  9:00   ` Matthieu Baerts [this message]
2025-11-13 11:30   ` kernel test robot
2025-11-13 17:16     ` Matthieu Baerts
2025-11-13 11:52   ` kernel test robot
2025-11-13  0:35 ` [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up MPTCP CI
2025-11-13  1:21 ` MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5253a2f3-99b0-4ac9-8d17-74c86bea2ef8@kernel.org \
    --to=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox