* [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up
@ 2025-11-13 0:10 Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case Paolo Abeni
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Paolo Abeni @ 2025-11-13 0:10 UTC (permalink / raw)
To: mptcp
This is a v2 of:
https://lore.kernel.org/mptcp/ed5c5ea1de503f6dc8a514b5eb4c9d16c431646f.1762696333.git.pabeni@redhat.com/
It turns out the previous revision was prone to a lot of ugly race.
Addressing them needs a some pre-reqs.
Races are due to MPJ subflows racing with accept() and possibly adding
non accounted memory to the backlog while the account code is trying to
handle backlog_len.
Paolo Abeni (3):
mptcp: fix grafting corner case
Squash-to: "mptcp: fix memcg accounting for passive sockets"
Squash-to: "mptcp: leverage the backlog for RX packet processing"
net/mptcp/protocol.c | 104 +++++++++++++++++++++++++++++++++++--------
net/mptcp/protocol.h | 1 +
2 files changed, 87 insertions(+), 18 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case
2025-11-13 0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
@ 2025-11-13 0:10 ` Paolo Abeni
2025-11-13 8:47 ` Matthieu Baerts
2025-11-13 0:10 ` [PATCH v3 mptcp-net 2/3] Squash-to: "mptcp: fix memcg accounting for passive sockets" Paolo Abeni
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2025-11-13 0:10 UTC (permalink / raw)
To: mptcp
If a passive MPTCP socket creates active subflows while still unaccepted,
__mptcp_subflow_connect() will try to graft such subflows to the msk,
but the msk struct socket is not yet initialized at that point:
the subflows will misbehave.
Address the issue always trying to graft the subflow in
mptcp_finish_join(), regardless of the subflow itself being active or
passive. To avoid races with accept(), access the msk->sk_socket under
the callback lock.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8965abb94b81..1b3c5fd01600 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -913,12 +913,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
mptcp_subflow_joined(msk, ssk);
spin_unlock_bh(&msk->fallback_lock);
- /* attach to msk socket only after we are sure we will deal with it
- * at close time
- */
- if (sk->sk_socket && !ssk->sk_socket)
- mptcp_sock_graft(ssk, sk->sk_socket);
-
mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
mptcp_sockopt_sync_locked(msk, ssk);
mptcp_stop_tout_timer(sk);
@@ -3734,6 +3728,20 @@ void mptcp_sock_graft(struct sock *sk, struct socket *parent)
write_unlock_bh(&sk->sk_callback_lock);
}
+static void mptcp_check_graft(struct sock *sk, struct sock *ssk)
+{
+ struct socket *sock;
+
+ if (ssk->sk_socket)
+ return;
+
+ write_lock_bh(&sk->sk_callback_lock);
+ sock = sk->sk_socket;
+ write_lock_bh(&sk->sk_callback_lock);
+ if (sock)
+ mptcp_sock_graft(ssk, sock);
+}
+
bool mptcp_finish_join(struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
@@ -3758,6 +3766,7 @@ bool mptcp_finish_join(struct sock *ssk)
}
mptcp_subflow_joined(msk, ssk);
spin_unlock_bh(&msk->fallback_lock);
+ mptcp_check_graft(parent, ssk);
mptcp_propagate_sndbuf(parent, ssk);
return true;
}
@@ -3767,6 +3776,8 @@ bool mptcp_finish_join(struct sock *ssk)
goto err_prohibited;
}
+ mptcp_check_graft(parent, ssk);
+
/* If we can't acquire msk socket lock here, let the release callback
* handle it
*/
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 mptcp-net 2/3] Squash-to: "mptcp: fix memcg accounting for passive sockets"
2025-11-13 0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case Paolo Abeni
@ 2025-11-13 0:10 ` Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2025-11-13 0:10 UTC (permalink / raw)
To: mptcp
__mptcp_inherit_memcg() is currently invoked by mptcp_graph_subflows()
with the wrong GFP flags, as lock_sock_fast() can yield atomic scope.
Since this is not the most extreme fast path, use plain lock_sock()
instead.
Additionally ensure the CG is correctly set even for active subflows
of not yet accepted passive msk.
Finally fix a typo in the mentioned helper name.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
I'm sorry for the bad directions again. I had to rework completely
the next patch due to several races still present, and that
required the change here, quite unexpected otherwise.
---
net/mptcp/protocol.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1b3c5fd01600..addd8025d235 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -916,8 +916,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
mptcp_sockopt_sync_locked(msk, ssk);
mptcp_stop_tout_timer(sk);
- __mptcp_inherit_cgrp_data(sk, ssk);
- __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC);
__mptcp_propagate_sndbuf(sk, ssk);
return true;
}
@@ -3737,9 +3735,13 @@ static void mptcp_check_graft(struct sock *sk, struct sock *ssk)
write_lock_bh(&sk->sk_callback_lock);
sock = sk->sk_socket;
- write_lock_bh(&sk->sk_callback_lock);
- if (sock)
+ write_unlock_bh(&sk->sk_callback_lock);
+
+ if (sock) {
mptcp_sock_graft(ssk, sock);
+ __mptcp_inherit_cgrp_data(sk, ssk);
+ __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC);
+ }
}
bool mptcp_finish_join(struct sock *ssk)
@@ -4052,18 +4054,17 @@ static int mptcp_listen(struct socket *sock, int backlog)
return err;
}
-static void mptcp_graph_subflows(struct sock *sk)
+static void mptcp_graft_subflows(struct sock *sk)
{
struct mptcp_subflow_context *subflow;
struct mptcp_sock *msk = mptcp_sk(sk);
mptcp_for_each_subflow(msk, subflow) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
- bool slow;
- slow = lock_sock_fast(ssk);
+ lock_sock(ssk);
- /* set ssk->sk_socket of accept()ed flows to mptcp socket.
+ /* Set ssk->sk_socket of accept()ed flows to mptcp socket.
* This is needed so NOSPACE flag can be set from tcp stack.
*/
if (!ssk->sk_socket)
@@ -4071,7 +4072,7 @@ static void mptcp_graph_subflows(struct sock *sk)
__mptcp_inherit_cgrp_data(sk, ssk);
__mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
- unlock_sock_fast(ssk, slow);
+ release_sock(ssk);
}
}
@@ -4122,7 +4123,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
msk = mptcp_sk(newsk);
msk->in_accept_queue = 0;
- mptcp_graph_subflows(newsk);
+ mptcp_graft_subflows(newsk);
mptcp_rps_record_subflows(msk);
/* Do late cleanup for the first subflow as necessary. Also
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
2025-11-13 0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 2/3] Squash-to: "mptcp: fix memcg accounting for passive sockets" Paolo Abeni
@ 2025-11-13 0:10 ` Paolo Abeni
2025-11-13 9:00 ` Matthieu Baerts
` (2 more replies)
2025-11-13 0:35 ` [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up MPTCP CI
2025-11-13 1:21 ` MPTCP CI
4 siblings, 3 replies; 13+ messages in thread
From: Paolo Abeni @ 2025-11-13 0:10 UTC (permalink / raw)
To: mptcp
If a subflow receives data before gaining the memcg while the msk
socket lock is held at accept time, or the PM locks the msk socket
while still unaccepted and subflows push data to it at the same time,
the mptcp_graph_subflows() can complete with a non empty backlog.
The msk will try to borrow such memory, but (some) of the skbs there
where not memcg charged. When the msk finally will return such accounted
memory, we should hit the same splat of #597.
[even if so far I was unable to replicate this scenario]
This patch tries to address such potential issue by:
- explicitly keep track of the amount of memory added to the backlog
not CG accounted
- additionally accouting for such memory at accept time
- preventing any subflow from adding memory to the backlog not CG
accounted after the above flush
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.c | 64 +++++++++++++++++++++++++++++++++++++++++---
net/mptcp/protocol.h | 1 +
2 files changed, 61 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index addd8025d235..abf0edc4b888 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -658,6 +658,7 @@ static void __mptcp_add_backlog(struct sock *sk,
{
struct mptcp_sock *msk = mptcp_sk(sk);
struct sk_buff *tail = NULL;
+ struct sock *ssk = skb->sk;
bool fragstolen;
int delta;
@@ -671,18 +672,26 @@ static void __mptcp_add_backlog(struct sock *sk,
tail = list_last_entry(&msk->backlog_list, struct sk_buff, list);
if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq &&
- skb->sk == tail->sk &&
+ ssk == tail->sk &&
__mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) {
skb->truesize -= delta;
kfree_skb_partial(skb, fragstolen);
__mptcp_subflow_lend_fwdmem(subflow, delta);
- WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
- return;
+ goto account;
}
list_add_tail(&skb->list, &msk->backlog_list);
mptcp_subflow_lend_fwdmem(subflow, skb);
- WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize);
+ delta = skb->truesize;
+
+account:
+ WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
+
+ /* Possibly not accept()ed yet, keep track of memory not CG
+ * accounted, mptcp_grapt_subflows will handle it.
+ */
+ if (!ssk->sk_memcg)
+ msk->backlog_unaccounted += delta;
}
static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
@@ -2154,6 +2163,12 @@ static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skbs)
{
struct mptcp_sock *msk = mptcp_sk(sk);
+ /* After CG initialization, subflows should never add skb before
+ * gaining the CG themself.
+ */
+ DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket &&
+ mem_cgroup_from_sk(sk));
+
/* Don't spool the backlog if the rcvbuf is full. */
if (list_empty(&msk->backlog_list) ||
sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
@@ -4059,6 +4074,22 @@ static void mptcp_graft_subflows(struct sock *sk)
struct mptcp_subflow_context *subflow;
struct mptcp_sock *msk = mptcp_sk(sk);
+ if (mem_cgroup_sockets_enabled) {
+ LIST_HEAD(join_list);
+
+ /* Subflows joining after __inet_accept() with get the
+ * mem CG properly initialized at mptcp_finish_join() time,
+ * but subflows pending in join_list need explicit
+ * initialization before flushing `backlog_unaccounted`
+ * or we can cat unexpeced unaccounted memory later.
+ */
+ mptcp_data_lock(sk);
+ list_splice_init(&msk->join_list, &join_list);
+ mptcp_data_unlock(sk);
+
+ __mptcp_flush_join_list(sk, &join_list);
+ }
+
mptcp_for_each_subflow(msk, subflow) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -4070,10 +4101,35 @@ static void mptcp_graft_subflows(struct sock *sk)
if (!ssk->sk_socket)
mptcp_sock_graft(ssk, sk->sk_socket);
+ if (!mem_cgroup_sk_enabled(sk))
+ goto unlock;
+
__mptcp_inherit_cgrp_data(sk, ssk);
__mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
+
+unlock:
release_sock(ssk);
}
+
+ if (mem_cgroup_sk_enabled(sk)) {
+ gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
+ int amt;
+
+ /* Account the backlog memory; prior accept() is aware of
+ * fwd and rmem only
+ */
+ mptcp_data_lock(sk);
+ amt = sk_mem_pages(sk->sk_forward_alloc +
+ msk->backlog_unaccounted +
+ atomic_read(&sk->sk_rmem_alloc)) -
+ sk_mem_pages(sk->sk_forward_alloc +
+ atomic_read(&sk->sk_rmem_alloc));
+ msk->backlog_unaccounted = 0;
+ mptcp_data_unlock(sk);
+
+ if (amt)
+ mem_cgroup_sk_charge(sk, amt, gfp);
+ }
}
static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 161b704be16b..199f28f3dd5e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -360,6 +360,7 @@ struct mptcp_sock {
struct list_head backlog_list; /* protected by the data lock */
u32 backlog_len;
+ u32 backlog_unaccounted;
};
#define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up
2025-11-13 0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
` (2 preceding siblings ...)
2025-11-13 0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
@ 2025-11-13 0:35 ` MPTCP CI
2025-11-13 1:21 ` MPTCP CI
4 siblings, 0 replies; 13+ messages in thread
From: MPTCP CI @ 2025-11-13 0:35 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
Thank you for your modifications, that's great!
But sadly, our CI spotted some issues with it when trying to build it.
You can find more details there:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19316383224
Status: failure
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/42d108656927
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1022713
Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up
2025-11-13 0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
` (3 preceding siblings ...)
2025-11-13 0:35 ` [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up MPTCP CI
@ 2025-11-13 1:21 ` MPTCP CI
4 siblings, 0 replies; 13+ messages in thread
From: MPTCP CI @ 2025-11-13 1:21 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19316383216
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/42d108656927
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1022713
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case
2025-11-13 0:10 ` [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case Paolo Abeni
@ 2025-11-13 8:47 ` Matthieu Baerts
2025-11-13 17:09 ` Paolo Abeni
0 siblings, 1 reply; 13+ messages in thread
From: Matthieu Baerts @ 2025-11-13 8:47 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
On 13/11/2025 01:10, Paolo Abeni wrote:
> If a passive MPTCP socket creates active subflows while still unaccepted,
> __mptcp_subflow_connect() will try to graft such subflows to the msk,
> but the msk struct socket is not yet initialized at that point:
> the subflows will misbehave.
What kind of errors were visible?
> Address the issue always trying to graft the subflow in
> mptcp_finish_join(), regardless of the subflow itself being active or
> passive. To avoid races with accept(), access the msk->sk_socket under
> the callback lock.
Thank you for addressing this issue!
>
By chance, do you have a Fixes tag to add here? (if it is for -net)
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 23 +++++++++++++++++------
> 1 file changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 8965abb94b81..1b3c5fd01600 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -913,12 +913,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
> mptcp_subflow_joined(msk, ssk);
> spin_unlock_bh(&msk->fallback_lock);
>
> - /* attach to msk socket only after we are sure we will deal with it
> - * at close time
> - */
> - if (sk->sk_socket && !ssk->sk_socket)
> - mptcp_sock_graft(ssk, sk->sk_socket);
> -
> mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
> mptcp_sockopt_sync_locked(msk, ssk);
> mptcp_stop_tout_timer(sk);
> @@ -3734,6 +3728,20 @@ void mptcp_sock_graft(struct sock *sk, struct socket *parent)
> write_unlock_bh(&sk->sk_callback_lock);
> }
>
> +static void mptcp_check_graft(struct sock *sk, struct sock *ssk)
> +{
> + struct socket *sock;
> +
> + if (ssk->sk_socket)
> + return;
> +
> + write_lock_bh(&sk->sk_callback_lock);
> + sock = sk->sk_socket;
> + write_lock_bh(&sk->sk_callback_lock);
The build job failed because here it should be the unlock version
(write_unlock_bh()).
(and probably an empty line just after).
If there is only that, I can fix that when applying the patches.
> + if (sock)
> + mptcp_sock_graft(ssk, sock);
> +}
> +
> bool mptcp_finish_join(struct sock *ssk)
> {
> struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> @@ -3758,6 +3766,7 @@ bool mptcp_finish_join(struct sock *ssk)
> }
> mptcp_subflow_joined(msk, ssk);
> spin_unlock_bh(&msk->fallback_lock);
> + mptcp_check_graft(parent, ssk);
> mptcp_propagate_sndbuf(parent, ssk);
> return true;
> }
> @@ -3767,6 +3776,8 @@ bool mptcp_finish_join(struct sock *ssk)
> goto err_prohibited;
> }
>
> + mptcp_check_graft(parent, ssk);
Is it OK to graft it even in case of errors in __mptcp_finish_join()?
i.e. not in an established state or !msk->allow_subflows.
Should it not be done after the block below, if there was no error?
> +
> /* If we can't acquire msk socket lock here, let the release callback
> * handle it
> */
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
2025-11-13 0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
@ 2025-11-13 9:00 ` Matthieu Baerts
2025-11-13 11:30 ` kernel test robot
2025-11-13 11:52 ` kernel test robot
2 siblings, 0 replies; 13+ messages in thread
From: Matthieu Baerts @ 2025-11-13 9:00 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 13/11/2025 01:10, Paolo Abeni wrote:
> If a subflow receives data before gaining the memcg while the msk
> socket lock is held at accept time, or the PM locks the msk socket
> while still unaccepted and subflows push data to it at the same time,
> the mptcp_graph_subflows() can complete with a non empty backlog.
>
> The msk will try to borrow such memory, but (some) of the skbs there
> where not memcg charged. When the msk finally will return such accounted
> memory, we should hit the same splat of #597.
> [even if so far I was unable to replicate this scenario]
>
> This patch tries to address such potential issue by:
> - explicitly keep track of the amount of memory added to the backlog
> not CG accounted
> - additionally accouting for such memory at accept time
> - preventing any subflow from adding memory to the backlog not CG
> accounted after the above flush
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 64 +++++++++++++++++++++++++++++++++++++++++---
> net/mptcp/protocol.h | 1 +
> 2 files changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index addd8025d235..abf0edc4b888 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -658,6 +658,7 @@ static void __mptcp_add_backlog(struct sock *sk,
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
> struct sk_buff *tail = NULL;
> + struct sock *ssk = skb->sk;
> bool fragstolen;
> int delta;
>
> @@ -671,18 +672,26 @@ static void __mptcp_add_backlog(struct sock *sk,
> tail = list_last_entry(&msk->backlog_list, struct sk_buff, list);
>
> if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq &&
> - skb->sk == tail->sk &&
> + ssk == tail->sk &&
> __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) {
> skb->truesize -= delta;
> kfree_skb_partial(skb, fragstolen);
> __mptcp_subflow_lend_fwdmem(subflow, delta);
> - WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
> - return;
> + goto account;
> }
>
> list_add_tail(&skb->list, &msk->backlog_list);
> mptcp_subflow_lend_fwdmem(subflow, skb);
> - WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize);
> + delta = skb->truesize;
> +
> +account:
> + WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
> +
> + /* Possibly not accept()ed yet, keep track of memory not CG
> + * accounted, mptcp_grapt_subflows will handle it.
detail: s/grapt/graft
> + */
> + if (!ssk->sk_memcg)
> + msk->backlog_unaccounted += delta;
> }
>
> static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> @@ -2154,6 +2163,12 @@ static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skbs)
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
>
> + /* After CG initialization, subflows should never add skb before
> + * gaining the CG themself.
> + */
> + DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket &&
> + mem_cgroup_from_sk(sk));
> +
> /* Don't spool the backlog if the rcvbuf is full. */
> if (list_empty(&msk->backlog_list) ||
> sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> @@ -4059,6 +4074,22 @@ static void mptcp_graft_subflows(struct sock *sk)
> struct mptcp_subflow_context *subflow;
> struct mptcp_sock *msk = mptcp_sk(sk);
>
> + if (mem_cgroup_sockets_enabled) {
> + LIST_HEAD(join_list);
> +
> + /* Subflows joining after __inet_accept() with get the
> + * mem CG properly initialized at mptcp_finish_join() time,
> + * but subflows pending in join_list need explicit
> + * initialization before flushing `backlog_unaccounted`
> + * or we can cat unexpeced unaccounted memory later.
detail: s/cat/catch/ or s/cat/get/?
> + */
> + mptcp_data_lock(sk);
> + list_splice_init(&msk->join_list, &join_list);
> + mptcp_data_unlock(sk);
> +
> + __mptcp_flush_join_list(sk, &join_list);
> + }
> +
> mptcp_for_each_subflow(msk, subflow) {
> struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>
> @@ -4070,10 +4101,35 @@ static void mptcp_graft_subflows(struct sock *sk)
> if (!ssk->sk_socket)
> mptcp_sock_graft(ssk, sk->sk_socket);
>
> + if (!mem_cgroup_sk_enabled(sk))
detail: what's the cost of this call? (static key + get)
Do we need to use a local variable to call this helper only once?
> + goto unlock;
> +
> __mptcp_inherit_cgrp_data(sk, ssk);
> __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
> +
> +unlock:
> release_sock(ssk);
> }
> +
> + if (mem_cgroup_sk_enabled(sk)) {
> + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
> + int amt;
> +
> + /* Account the backlog memory; prior accept() is aware of
> + * fwd and rmem only
> + */
> + mptcp_data_lock(sk);
> + amt = sk_mem_pages(sk->sk_forward_alloc +
> + msk->backlog_unaccounted +
> + atomic_read(&sk->sk_rmem_alloc)) -
> + sk_mem_pages(sk->sk_forward_alloc +
> + atomic_read(&sk->sk_rmem_alloc));
> + msk->backlog_unaccounted = 0;
> + mptcp_data_unlock(sk);
> +
> + if (amt)
> + mem_cgroup_sk_charge(sk, amt, gfp);
> + }
> }
>
> static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 161b704be16b..199f28f3dd5e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -360,6 +360,7 @@ struct mptcp_sock {
>
> struct list_head backlog_list; /* protected by the data lock */
> u32 backlog_len;
> + u32 backlog_unaccounted;
> };
>
> #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
2025-11-13 0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
2025-11-13 9:00 ` Matthieu Baerts
@ 2025-11-13 11:30 ` kernel test robot
2025-11-13 17:16 ` Matthieu Baerts
2025-11-13 11:52 ` kernel test robot
2 siblings, 1 reply; 13+ messages in thread
From: kernel test robot @ 2025-11-13 11:30 UTC (permalink / raw)
To: Paolo Abeni, mptcp; +Cc: oe-kbuild-all
Hi Paolo,
kernel test robot noticed the following build errors:
[auto build test ERROR on mptcp/export]
[cannot apply to mptcp/export-net linus/master v6.18-rc5 next-20251113]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-fix-grafting-corner-case/20251113-081224
base: https://github.com/multipath-tcp/mptcp_net-next.git export
patch link: https://lore.kernel.org/r/ee40999f77d0186abba2a6b21958c1a8c4881c57.1762992570.git.pabeni%40redhat.com
patch subject: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
config: openrisc-defconfig (https://download.01.org/0day-ci/archive/20251113/202511131931.1yh3eKYe-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251113/202511131931.1yh3eKYe-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511131931.1yh3eKYe-lkp@intel.com/
All errors (new ones prefixed by >>):
net/mptcp/protocol.c: In function '__mptcp_add_backlog':
>> net/mptcp/protocol.c:693:17: error: 'struct sock' has no member named 'sk_memcg'
693 | if (!ssk->sk_memcg)
| ^~
vim +693 net/mptcp/protocol.c
654
655 static void __mptcp_add_backlog(struct sock *sk,
656 struct mptcp_subflow_context *subflow,
657 struct sk_buff *skb)
658 {
659 struct mptcp_sock *msk = mptcp_sk(sk);
660 struct sk_buff *tail = NULL;
661 struct sock *ssk = skb->sk;
662 bool fragstolen;
663 int delta;
664
665 if (unlikely(sk->sk_state == TCP_CLOSE)) {
666 kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE);
667 return;
668 }
669
670 /* Try to coalesce with the last skb in our backlog */
671 if (!list_empty(&msk->backlog_list))
672 tail = list_last_entry(&msk->backlog_list, struct sk_buff, list);
673
674 if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq &&
675 ssk == tail->sk &&
676 __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) {
677 skb->truesize -= delta;
678 kfree_skb_partial(skb, fragstolen);
679 __mptcp_subflow_lend_fwdmem(subflow, delta);
680 goto account;
681 }
682
683 list_add_tail(&skb->list, &msk->backlog_list);
684 mptcp_subflow_lend_fwdmem(subflow, skb);
685 delta = skb->truesize;
686
687 account:
688 WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
689
690 /* Possibly not accept()ed yet, keep track of memory not CG
691 * accounted, mptcp_grapt_subflows will handle it.
692 */
> 693 if (!ssk->sk_memcg)
694 msk->backlog_unaccounted += delta;
695 }
696
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
2025-11-13 0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
2025-11-13 9:00 ` Matthieu Baerts
2025-11-13 11:30 ` kernel test robot
@ 2025-11-13 11:52 ` kernel test robot
2 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2025-11-13 11:52 UTC (permalink / raw)
To: Paolo Abeni, mptcp; +Cc: llvm, oe-kbuild-all
Hi Paolo,
kernel test robot noticed the following build errors:
[auto build test ERROR on mptcp/export]
[cannot apply to mptcp/export-net linus/master v6.18-rc5 next-20251113]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-fix-grafting-corner-case/20251113-081224
base: https://github.com/multipath-tcp/mptcp_net-next.git export
patch link: https://lore.kernel.org/r/ee40999f77d0186abba2a6b21958c1a8c4881c57.1762992570.git.pabeni%40redhat.com
patch subject: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
config: sparc64-defconfig (https://download.01.org/0day-ci/archive/20251113/202511131922.61JkdJO9-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251113/202511131922.61JkdJO9-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511131922.61JkdJO9-lkp@intel.com/
All errors (new ones prefixed by >>):
>> net/mptcp/protocol.c:693:12: error: no member named 'sk_memcg' in 'struct sock'
693 | if (!ssk->sk_memcg)
| ~~~ ^
1 error generated.
vim +693 net/mptcp/protocol.c
654
655 static void __mptcp_add_backlog(struct sock *sk,
656 struct mptcp_subflow_context *subflow,
657 struct sk_buff *skb)
658 {
659 struct mptcp_sock *msk = mptcp_sk(sk);
660 struct sk_buff *tail = NULL;
661 struct sock *ssk = skb->sk;
662 bool fragstolen;
663 int delta;
664
665 if (unlikely(sk->sk_state == TCP_CLOSE)) {
666 kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE);
667 return;
668 }
669
670 /* Try to coalesce with the last skb in our backlog */
671 if (!list_empty(&msk->backlog_list))
672 tail = list_last_entry(&msk->backlog_list, struct sk_buff, list);
673
674 if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq &&
675 ssk == tail->sk &&
676 __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) {
677 skb->truesize -= delta;
678 kfree_skb_partial(skb, fragstolen);
679 __mptcp_subflow_lend_fwdmem(subflow, delta);
680 goto account;
681 }
682
683 list_add_tail(&skb->list, &msk->backlog_list);
684 mptcp_subflow_lend_fwdmem(subflow, skb);
685 delta = skb->truesize;
686
687 account:
688 WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
689
690 /* Possibly not accept()ed yet, keep track of memory not CG
691 * accounted, mptcp_grapt_subflows will handle it.
692 */
> 693 if (!ssk->sk_memcg)
694 msk->backlog_unaccounted += delta;
695 }
696
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case
2025-11-13 8:47 ` Matthieu Baerts
@ 2025-11-13 17:09 ` Paolo Abeni
2025-11-13 17:14 ` Matthieu Baerts
0 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2025-11-13 17:09 UTC (permalink / raw)
To: Matthieu Baerts; +Cc: mptcp
On 11/13/25 9:47 AM, Matthieu Baerts wrote:
> Hi Paolo,
>
> On 13/11/2025 01:10, Paolo Abeni wrote:
>> If a passive MPTCP socket creates active subflows while still unaccepted,
>> __mptcp_subflow_connect() will try to graft such subflows to the msk,
>> but the msk struct socket is not yet initialized at that point:
>> the subflows will misbehave.
>
> What kind of errors were visible?
Found by code inspection, never replicated in practice; thinking again
about it. I'm no more sure this is actually needed: the subflow
mentioned above should be catched and fixed at accept time.
/me needs more coffee and thinking...
>> +static void mptcp_check_graft(struct sock *sk, struct sock *ssk)
>> +{
>> + struct socket *sock;
>> +
>> + if (ssk->sk_socket)
>> + return;
>> +
>> + write_lock_bh(&sk->sk_callback_lock);
>> + sock = sk->sk_socket;
>> + write_lock_bh(&sk->sk_callback_lock);
>
> The build job failed because here it should be the unlock version
> (write_unlock_bh()).
> (and probably an empty line just after).
Oh, it looks like I shared an older revision, sorry.
> If there is only that, I can fix that when applying the patches.
>
>> + if (sock)
>> + mptcp_sock_graft(ssk, sock);
>> +}
>> +
>> bool mptcp_finish_join(struct sock *ssk)
>> {
>> struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
>> @@ -3758,6 +3766,7 @@ bool mptcp_finish_join(struct sock *ssk)
>> }
>> mptcp_subflow_joined(msk, ssk);
>> spin_unlock_bh(&msk->fallback_lock);
>> + mptcp_check_graft(parent, ssk);
>> mptcp_propagate_sndbuf(parent, ssk);
>> return true;
>> }
>> @@ -3767,6 +3776,8 @@ bool mptcp_finish_join(struct sock *ssk)
>> goto err_prohibited;
>> }
>>
>> + mptcp_check_graft(parent, ssk);
>
> Is it OK to graft it even in case of errors in __mptcp_finish_join()?
> i.e. not in an established state or !msk->allow_subflows.
>
> Should it not be done after the block below, if there was no error?
My understanding/hope is that grafting early don't cause issues:
- no UaF access to ssk->sk_socket, because ssk either do finish
successfully the join or is reset/delete very soon.
- no ref leak, because ssk does not acquire any actual reference on
msk/socket.
It's needed to do it in mptcp_finish_join(), to ensure that in patch 3/3
all the subflows will always push accounted data after that
mptcp_graft_subflows() completes - or more specifically the
`backlog_unaccounted` counter is flushed.
Let me see if I can rework 3/3 without this patch
/P
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case
2025-11-13 17:09 ` Paolo Abeni
@ 2025-11-13 17:14 ` Matthieu Baerts
0 siblings, 0 replies; 13+ messages in thread
From: Matthieu Baerts @ 2025-11-13 17:14 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
On 13/11/2025 18:09, Paolo Abeni wrote:
> On 11/13/25 9:47 AM, Matthieu Baerts wrote:
>> On 13/11/2025 01:10, Paolo Abeni wrote:
(...)
>>> @@ -3767,6 +3776,8 @@ bool mptcp_finish_join(struct sock *ssk)
>>> goto err_prohibited;
>>> }
>>>
>>> + mptcp_check_graft(parent, ssk);
>>
>> Is it OK to graft it even in case of errors in __mptcp_finish_join()?
>> i.e. not in an established state or !msk->allow_subflows.
>>
>> Should it not be done after the block below, if there was no error?
>
> My understanding/hope is that grafting early don't cause issues:
> - no UaF access to ssk->sk_socket, because ssk either do finish
> successfully the join or is reset/delete very soon.
> - no ref leak, because ssk does not acquire any actual reference on
> msk/socket.
>
> It's needed to do it in mptcp_finish_join(), to ensure that in patch 3/3
> all the subflows will always push accounted data after that
> mptcp_graft_subflows() completes - or more specifically the
> `backlog_unaccounted` counter is flushed.
>
> Let me see if I can rework 3/3 without this patch
We don't need to drop this patch if it fixes thing. I was only asking
some questions to understand the behavioural change, and if that was not
going to introduce different issues :)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
2025-11-13 11:30 ` kernel test robot
@ 2025-11-13 17:16 ` Matthieu Baerts
0 siblings, 0 replies; 13+ messages in thread
From: Matthieu Baerts @ 2025-11-13 17:16 UTC (permalink / raw)
To: kernel test robot, Paolo Abeni, mptcp; +Cc: oe-kbuild-all
Hello,
On 13/11/2025 12:30, kernel test robot wrote:
> Hi Paolo,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on mptcp/export]
> [cannot apply to mptcp/export-net linus/master v6.18-rc5 next-20251113]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-fix-grafting-corner-case/20251113-081224
> base: https://github.com/multipath-tcp/mptcp_net-next.git export
> patch link: https://lore.kernel.org/r/ee40999f77d0186abba2a6b21958c1a8c4881c57.1762992570.git.pabeni%40redhat.com
> patch subject: [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing"
> config: openrisc-defconfig (https://download.01.org/0day-ci/archive/20251113/202511131931.1yh3eKYe-lkp@intel.com/config)
> compiler: or1k-linux-gcc (GCC) 15.1.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251113/202511131931.1yh3eKYe-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202511131931.1yh3eKYe-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> net/mptcp/protocol.c: In function '__mptcp_add_backlog':
>>> net/mptcp/protocol.c:693:17: error: 'struct sock' has no member named 'sk_memcg'
> 693 | if (!ssk->sk_memcg)
I guess mem_cgroup_from_sk() should be used if this patch doesn't target
-net.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-11-13 17:16 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13 0:10 [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 1/3] mptcp: fix grafting corner case Paolo Abeni
2025-11-13 8:47 ` Matthieu Baerts
2025-11-13 17:09 ` Paolo Abeni
2025-11-13 17:14 ` Matthieu Baerts
2025-11-13 0:10 ` [PATCH v3 mptcp-net 2/3] Squash-to: "mptcp: fix memcg accounting for passive sockets" Paolo Abeni
2025-11-13 0:10 ` [PATCH v3 mptcp-net 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Paolo Abeni
2025-11-13 9:00 ` Matthieu Baerts
2025-11-13 11:30 ` kernel test robot
2025-11-13 17:16 ` Matthieu Baerts
2025-11-13 11:52 ` kernel test robot
2025-11-13 0:35 ` [PATCH v3 mptcp-net 0/3] mptcp: cg and backlog follow-up MPTCP CI
2025-11-13 1:21 ` MPTCP CI
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox