* [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets
@ 2025-11-07 21:55 Paolo Abeni
2025-11-07 21:55 ` [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper Paolo Abeni
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: Paolo Abeni @ 2025-11-07 21:55 UTC (permalink / raw)
To: mptcp
It turns out that the splat reported earlier is indeed due to an old
bug/missing feature that hit hard on top of "mptcp: borrow forward
memory from subflow", as we can borrow accounted memory from a socket
not doing any accounting;)
It should land upstream before the above mentioned commit, ence the
'net' target, but it could as well be net-next material.
I think there is still a racing window to produce the splat reported in
#597: if a subflow receive data while the msk is locked, and before the
memcg is initialized on such subflow, it will push the data into the msk
becklog, and later borrow will again try to grab accounted memory from
non accounting ssk.
I have a tentative fix for that one but it's quite ugly and I'll keep it
separate from this series.
Paolo Abeni (3):
net: factor-out _sk_charge() helper
mptcp: factor-out cgroup data inherit helper
mptcp: fix memcg accounting for passive sockets
include/net/sock.h | 2 ++
net/core/sock.c | 18 ++++++++++++++++++
net/ipv4/af_inet.c | 17 +----------------
net/mptcp/protocol.c | 36 ++++++++++++++++++++++++++----------
net/mptcp/protocol.h | 3 +++
net/mptcp/subflow.c | 30 ++++++++++++++++++++++--------
6 files changed, 72 insertions(+), 34 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper
2025-11-07 21:55 [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
@ 2025-11-07 21:55 ` Paolo Abeni
2025-11-11 7:24 ` Geliang Tang
2025-11-11 17:38 ` Matthieu Baerts
2025-11-07 21:55 ` [PATCH mptcp-net 2/3] mptcp: factor-out cgroup data inherit helper Paolo Abeni
` (4 subsequent siblings)
5 siblings, 2 replies; 19+ messages in thread
From: Paolo Abeni @ 2025-11-07 21:55 UTC (permalink / raw)
To: mptcp
Move out of __inet_accept() the code dealing charging newly
accepted socket to memcg. MPTCP will soon use it to on a per
subflow basis, in different contexts.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/net/sock.h | 2 ++
net/core/sock.c | 18 ++++++++++++++++++
net/ipv4/af_inet.c | 17 +----------------
3 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index a5f36ea9d46f..38d48cfe0741 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1631,6 +1631,8 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
sk_mem_reclaim(sk);
}
+void __sk_charge(struct sock *sk, gfp_t gfp);
+
#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
static inline void sk_owner_set(struct sock *sk, struct module *owner)
{
diff --git a/net/core/sock.c b/net/core/sock.c
index 3b74fc71f51c..b26a6cdc9bcd 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3448,6 +3448,24 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
}
EXPORT_SYMBOL(__sk_mem_reclaim);
+void __sk_charge(struct sock *sk, gfp_t gfp)
+{
+ int amt;
+
+ gfp |= __GFP_NOFAIL;
+ if (mem_cgroup_from_sk(sk)) {
+ /* The socket has not been accepted yet, no need
+ * to look at newsk->sk_wmem_queued.
+ */
+ amt = sk_mem_pages(sk->sk_forward_alloc +
+ atomic_read(&sk->sk_rmem_alloc));
+ if (amt)
+ mem_cgroup_sk_charge(sk, amt, gfp);
+ }
+
+ kmem_cache_charge(sk, gfp);
+}
+
int sk_set_peek_off(struct sock *sk, int val)
{
WRITE_ONCE(sk->sk_peek_off, val);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index a31b94ce8968..08d811f11896 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
void __inet_accept(struct socket *sock, struct socket *newsock, struct sock *newsk)
{
if (mem_cgroup_sockets_enabled) {
- gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
-
mem_cgroup_sk_alloc(newsk);
-
- if (mem_cgroup_from_sk(newsk)) {
- int amt;
-
- /* The socket has not been accepted yet, no need
- * to look at newsk->sk_wmem_queued.
- */
- amt = sk_mem_pages(newsk->sk_forward_alloc +
- atomic_read(&newsk->sk_rmem_alloc));
- if (amt)
- mem_cgroup_sk_charge(newsk, amt, gfp);
- }
-
- kmem_cache_charge(newsk, gfp);
+ __sk_charge(newsk, GFP_KERNEL);
}
sock_rps_record_flow(newsk);
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH mptcp-net 2/3] mptcp: factor-out cgroup data inherit helper
2025-11-07 21:55 [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
2025-11-07 21:55 ` [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper Paolo Abeni
@ 2025-11-07 21:55 ` Paolo Abeni
2025-11-11 7:24 ` Geliang Tang
2025-11-07 21:55 ` [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Paolo Abeni @ 2025-11-07 21:55 UTC (permalink / raw)
To: mptcp
MPTCP will soon need the same functionality for passive sockets,
factor them out in a common helper. No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.h | 2 ++
net/mptcp/subflow.c | 20 ++++++++++++--------
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 051f21b06d33..300ac7030958 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -755,6 +755,8 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_action *delegated)
return ret;
}
+void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk);
+
int mptcp_is_enabled(const struct net *net);
unsigned int mptcp_get_add_addr_timeout(const struct net *net);
int mptcp_is_checksum_enabled(const struct net *net);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 60f100b2a0c9..599a21d92590 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1721,21 +1721,25 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_pm_local *local,
return err;
}
-static void mptcp_attach_cgroup(struct sock *parent, struct sock *child)
+void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk)
{
#ifdef CONFIG_SOCK_CGROUP_DATA
- struct sock_cgroup_data *parent_skcd = &parent->sk_cgrp_data,
- *child_skcd = &child->sk_cgrp_data;
+ struct sock_cgroup_data *sk_cd = &sk->sk_cgrp_data,
+ *ssk_cd = &ssk->sk_cgrp_data;
/* only the additional subflows created by kworkers have to be modified */
- if (cgroup_id(sock_cgroup_ptr(parent_skcd)) !=
- cgroup_id(sock_cgroup_ptr(child_skcd))) {
- cgroup_sk_free(child_skcd);
- *child_skcd = *parent_skcd;
- cgroup_sk_clone(child_skcd);
+ if (cgroup_id(sock_cgroup_ptr(sk_cd)) !=
+ cgroup_id(sock_cgroup_ptr(ssk_cd))) {
+ cgroup_sk_free(ssk_cd);
+ *ssk_cd = *sk_cd;
+ cgroup_sk_clone(sk_cd);
}
#endif /* CONFIG_SOCK_CGROUP_DATA */
+}
+static void mptcp_attach_cgroup(struct sock *parent, struct sock *child)
+{
+ __mptcp_inherit_cgrp_data(parent, child);
if (mem_cgroup_sockets_enabled)
mem_cgroup_sk_inherit(parent, child);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets
2025-11-07 21:55 [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
2025-11-07 21:55 ` [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper Paolo Abeni
2025-11-07 21:55 ` [PATCH mptcp-net 2/3] mptcp: factor-out cgroup data inherit helper Paolo Abeni
@ 2025-11-07 21:55 ` Paolo Abeni
2025-11-11 8:37 ` Geliang Tang
2025-11-08 15:15 ` [PATCH mptcp-net 0/3] " MPTCP CI
` (2 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Paolo Abeni @ 2025-11-07 21:55 UTC (permalink / raw)
To: mptcp
The passive sockets never got proper memcg accounting: the msk
socket is associated with the memcg at accept time, but the
passive subflows never got it right.
At accept time, traverse the subflows list and associate each of them
with the msk memcg, and try to do the same at join completion time, if
the msk has been already accepted.
Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/298
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/597
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.c | 36 ++++++++++++++++++++++++++----------
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 10 ++++++++++
3 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 34e6bc731085..5e9325c7ea9c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -959,6 +959,8 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
mptcp_sockopt_sync_locked(msk, ssk);
mptcp_stop_tout_timer(sk);
+ __mptcp_inherit_cgrp_data(sk, ssk);
+ __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC);
__mptcp_propagate_sndbuf(sk, ssk);
__mptcp_propagate_rcvspace(sk, ssk);
return true;
@@ -4076,6 +4078,29 @@ static int mptcp_listen(struct socket *sock, int backlog)
return err;
}
+static void mptcp_graph_subflows(struct sock *sk)
+{
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ bool slow;
+
+ slow = lock_sock_fast(ssk);
+
+ /* set ssk->sk_socket of accept()ed flows to mptcp socket.
+ * This is needed so NOSPACE flag can be set from tcp stack.
+ */
+ if (!ssk->sk_socket)
+ mptcp_sock_graft(ssk, sk->sk_socket);
+
+ __mptcp_inherit_cgrp_data(sk, ssk);
+ __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
+ unlock_sock_fast(ssk, slow);
+ }
+}
+
static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
struct proto_accept_arg *arg)
{
@@ -4123,16 +4148,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
msk = mptcp_sk(newsk);
msk->in_accept_queue = 0;
- /* set ssk->sk_socket of accept()ed flows to mptcp socket.
- * This is needed so NOSPACE flag can be set from tcp stack.
- */
- mptcp_for_each_subflow(msk, subflow) {
- struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-
- if (!ssk->sk_socket)
- mptcp_sock_graft(ssk, newsock);
- }
-
+ mptcp_graph_subflows(newsk);
mptcp_rps_record_subflows(msk);
/* Do late cleanup for the first subflow as necessary. Also
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 300ac7030958..60bfe50530dc 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -755,6 +755,7 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_action *delegated)
return ret;
}
+void __mptcp_inherit_memcg(struct sock *sk, struct sock *ssk, gfp_t gfp);
void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk);
int mptcp_is_enabled(const struct net *net);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 599a21d92590..39bf69e73975 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1721,6 +1721,16 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_pm_local *local,
return err;
}
+void __mptcp_inherit_memcg(struct sock *sk, struct sock *ssk, gfp_t gfp)
+{
+ /* Only if the msk has been accepted already (and not orphaned).*/
+ if (!mem_cgroup_sockets_enabled || !sk->sk_socket)
+ return;
+
+ mem_cgroup_sk_inherit(sk, ssk);
+ __sk_charge(ssk, gfp);
+}
+
void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk)
{
#ifdef CONFIG_SOCK_CGROUP_DATA
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets
2025-11-07 21:55 [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
` (2 preceding siblings ...)
2025-11-07 21:55 ` [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
@ 2025-11-08 15:15 ` MPTCP CI
2025-11-11 17:26 ` Matthieu Baerts
2025-11-12 10:10 ` Matthieu Baerts
5 siblings, 0 replies; 19+ messages in thread
From: MPTCP CI @ 2025-11-08 15:15 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Critical: KMemLeak ❌
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19194272317
Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/5451f3b28abd
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1021093
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper
2025-11-07 21:55 ` [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper Paolo Abeni
@ 2025-11-11 7:24 ` Geliang Tang
2025-11-11 17:38 ` Matthieu Baerts
1 sibling, 0 replies; 19+ messages in thread
From: Geliang Tang @ 2025-11-11 7:24 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On Fri, 2025-11-07 at 22:55 +0100, Paolo Abeni wrote:
> Move out of __inet_accept() the code dealing charging newly
> accepted socket to memcg. MPTCP will soon use it to on a per
> subflow basis, in different contexts.
>
> No functional changes intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch looks good to me.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Thanks,
-Geliang
> ---
> include/net/sock.h | 2 ++
> net/core/sock.c | 18 ++++++++++++++++++
> net/ipv4/af_inet.c | 17 +----------------
> 3 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index a5f36ea9d46f..38d48cfe0741 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1631,6 +1631,8 @@ static inline void sk_mem_uncharge(struct sock
> *sk, int size)
> sk_mem_reclaim(sk);
> }
>
> +void __sk_charge(struct sock *sk, gfp_t gfp);
> +
> #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
> static inline void sk_owner_set(struct sock *sk, struct module
> *owner)
> {
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 3b74fc71f51c..b26a6cdc9bcd 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3448,6 +3448,24 @@ void __sk_mem_reclaim(struct sock *sk, int
> amount)
> }
> EXPORT_SYMBOL(__sk_mem_reclaim);
>
> +void __sk_charge(struct sock *sk, gfp_t gfp)
> +{
> + int amt;
> +
> + gfp |= __GFP_NOFAIL;
> + if (mem_cgroup_from_sk(sk)) {
> + /* The socket has not been accepted yet, no need
> + * to look at newsk->sk_wmem_queued.
> + */
> + amt = sk_mem_pages(sk->sk_forward_alloc +
> + atomic_read(&sk->sk_rmem_alloc));
> + if (amt)
> + mem_cgroup_sk_charge(sk, amt, gfp);
> + }
> +
> + kmem_cache_charge(sk, gfp);
> +}
> +
> int sk_set_peek_off(struct sock *sk, int val)
> {
> WRITE_ONCE(sk->sk_peek_off, val);
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index a31b94ce8968..08d811f11896 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
> void __inet_accept(struct socket *sock, struct socket *newsock,
> struct sock *newsk)
> {
> if (mem_cgroup_sockets_enabled) {
> - gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
> -
> mem_cgroup_sk_alloc(newsk);
> -
> - if (mem_cgroup_from_sk(newsk)) {
> - int amt;
> -
> - /* The socket has not been accepted yet, no
> need
> - * to look at newsk->sk_wmem_queued.
> - */
> - amt = sk_mem_pages(newsk->sk_forward_alloc +
> - atomic_read(&newsk-
> >sk_rmem_alloc));
> - if (amt)
> - mem_cgroup_sk_charge(newsk, amt,
> gfp);
> - }
> -
> - kmem_cache_charge(newsk, gfp);
> + __sk_charge(newsk, GFP_KERNEL);
> }
>
> sock_rps_record_flow(newsk);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 2/3] mptcp: factor-out cgroup data inherit helper
2025-11-07 21:55 ` [PATCH mptcp-net 2/3] mptcp: factor-out cgroup data inherit helper Paolo Abeni
@ 2025-11-11 7:24 ` Geliang Tang
0 siblings, 0 replies; 19+ messages in thread
From: Geliang Tang @ 2025-11-11 7:24 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On Fri, 2025-11-07 at 22:55 +0100, Paolo Abeni wrote:
> MPTCP will soon need the same functionality for passive sockets,
> factor them out in a common helper. No functional change intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch looks good to me.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Thanks,
-Geliang
> ---
> net/mptcp/protocol.h | 2 ++
> net/mptcp/subflow.c | 20 ++++++++++++--------
> 2 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 051f21b06d33..300ac7030958 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -755,6 +755,8 @@ mptcp_subflow_delegated_next(struct
> mptcp_delegated_action *delegated)
> return ret;
> }
>
> +void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk);
> +
> int mptcp_is_enabled(const struct net *net);
> unsigned int mptcp_get_add_addr_timeout(const struct net *net);
> int mptcp_is_checksum_enabled(const struct net *net);
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 60f100b2a0c9..599a21d92590 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1721,21 +1721,25 @@ int __mptcp_subflow_connect(struct sock *sk,
> const struct mptcp_pm_local *local,
> return err;
> }
>
> -static void mptcp_attach_cgroup(struct sock *parent, struct sock
> *child)
> +void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk)
> {
> #ifdef CONFIG_SOCK_CGROUP_DATA
> - struct sock_cgroup_data *parent_skcd = &parent-
> >sk_cgrp_data,
> - *child_skcd = &child->sk_cgrp_data;
> + struct sock_cgroup_data *sk_cd = &sk->sk_cgrp_data,
> + *ssk_cd = &ssk->sk_cgrp_data;
>
> /* only the additional subflows created by kworkers have to
> be modified */
> - if (cgroup_id(sock_cgroup_ptr(parent_skcd)) !=
> - cgroup_id(sock_cgroup_ptr(child_skcd))) {
> - cgroup_sk_free(child_skcd);
> - *child_skcd = *parent_skcd;
> - cgroup_sk_clone(child_skcd);
> + if (cgroup_id(sock_cgroup_ptr(sk_cd)) !=
> + cgroup_id(sock_cgroup_ptr(ssk_cd))) {
> + cgroup_sk_free(ssk_cd);
> + *ssk_cd = *sk_cd;
> + cgroup_sk_clone(sk_cd);
> }
> #endif /* CONFIG_SOCK_CGROUP_DATA */
> +}
>
> +static void mptcp_attach_cgroup(struct sock *parent, struct sock
> *child)
> +{
> + __mptcp_inherit_cgrp_data(parent, child);
> if (mem_cgroup_sockets_enabled)
> mem_cgroup_sk_inherit(parent, child);
> }
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets
2025-11-07 21:55 ` [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
@ 2025-11-11 8:37 ` Geliang Tang
2025-11-11 17:25 ` Matthieu Baerts
0 siblings, 1 reply; 19+ messages in thread
From: Geliang Tang @ 2025-11-11 8:37 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
Thanks for this fix.
On Fri, 2025-11-07 at 22:55 +0100, Paolo Abeni wrote:
> The passive sockets never got proper memcg accounting: the msk
> socket is associated with the memcg at accept time, but the
> passive subflows never got it right.
>
> At accept time, traverse the subflows list and associate each of them
> with the msk memcg, and try to do the same at join completion time,
> if
> the msk has been already accepted.
>
> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming
> connections")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/298
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/597
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 36 ++++++++++++++++++++++++++----------
> net/mptcp/protocol.h | 1 +
> net/mptcp/subflow.c | 10 ++++++++++
> 3 files changed, 37 insertions(+), 10 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 34e6bc731085..5e9325c7ea9c 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -959,6 +959,8 @@ static bool __mptcp_finish_join(struct mptcp_sock
> *msk, struct sock *ssk)
> mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
> mptcp_sockopt_sync_locked(msk, ssk);
> mptcp_stop_tout_timer(sk);
> + __mptcp_inherit_cgrp_data(sk, ssk);
> + __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC);
In mptcp_graph_subflows(), these two lines already have the
mem_cgroup_from_sk() check (in the squash-to patch). It's unclear
whether mem_cgroup_from_sk() check needs to be added here too.
Thanks,
-Geliang
> __mptcp_propagate_sndbuf(sk, ssk);
> __mptcp_propagate_rcvspace(sk, ssk);
> return true;
> @@ -4076,6 +4078,29 @@ static int mptcp_listen(struct socket *sock,
> int backlog)
> return err;
> }
>
> +static void mptcp_graph_subflows(struct sock *sk)
> +{
> + struct mptcp_subflow_context *subflow;
> + struct mptcp_sock *msk = mptcp_sk(sk);
> +
> + mptcp_for_each_subflow(msk, subflow) {
> + struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> + bool slow;
> +
> + slow = lock_sock_fast(ssk);
> +
> + /* set ssk->sk_socket of accept()ed flows to mptcp
> socket.
> + * This is needed so NOSPACE flag can be set from
> tcp stack.
> + */
> + if (!ssk->sk_socket)
> + mptcp_sock_graft(ssk, sk->sk_socket);
> +
> + __mptcp_inherit_cgrp_data(sk, ssk);
> + __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL);
> + unlock_sock_fast(ssk, slow);
> + }
> +}
> +
> static int mptcp_stream_accept(struct socket *sock, struct socket
> *newsock,
> struct proto_accept_arg *arg)
> {
> @@ -4123,16 +4148,7 @@ static int mptcp_stream_accept(struct socket
> *sock, struct socket *newsock,
> msk = mptcp_sk(newsk);
> msk->in_accept_queue = 0;
>
> - /* set ssk->sk_socket of accept()ed flows to mptcp
> socket.
> - * This is needed so NOSPACE flag can be set from
> tcp stack.
> - */
> - mptcp_for_each_subflow(msk, subflow) {
> - struct sock *ssk =
> mptcp_subflow_tcp_sock(subflow);
> -
> - if (!ssk->sk_socket)
> - mptcp_sock_graft(ssk, newsock);
> - }
> -
> + mptcp_graph_subflows(newsk);
> mptcp_rps_record_subflows(msk);
>
> /* Do late cleanup for the first subflow as
> necessary. Also
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 300ac7030958..60bfe50530dc 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -755,6 +755,7 @@ mptcp_subflow_delegated_next(struct
> mptcp_delegated_action *delegated)
> return ret;
> }
>
> +void __mptcp_inherit_memcg(struct sock *sk, struct sock *ssk, gfp_t
> gfp);
> void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk);
>
> int mptcp_is_enabled(const struct net *net);
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 599a21d92590..39bf69e73975 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1721,6 +1721,16 @@ int __mptcp_subflow_connect(struct sock *sk,
> const struct mptcp_pm_local *local,
> return err;
> }
>
> +void __mptcp_inherit_memcg(struct sock *sk, struct sock *ssk, gfp_t
> gfp)
> +{
> + /* Only if the msk has been accepted already (and not
> orphaned).*/
> + if (!mem_cgroup_sockets_enabled || !sk->sk_socket)
> + return;
> +
> + mem_cgroup_sk_inherit(sk, ssk);
> + __sk_charge(ssk, gfp);
> +}
> +
> void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk)
> {
> #ifdef CONFIG_SOCK_CGROUP_DATA
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets
2025-11-11 8:37 ` Geliang Tang
@ 2025-11-11 17:25 ` Matthieu Baerts
2025-11-12 9:10 ` Paolo Abeni
0 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-11-11 17:25 UTC (permalink / raw)
To: Geliang Tang; +Cc: Paolo Abeni, mptcp
Hi Geliang,
On 11/11/2025 09:37, Geliang Tang wrote:
> Hi Paolo,
>
> Thanks for this fix.
>
> On Fri, 2025-11-07 at 22:55 +0100, Paolo Abeni wrote:
>> The passive sockets never got proper memcg accounting: the msk
>> socket is associated with the memcg at accept time, but the
>> passive subflows never got it right.
>>
>> At accept time, traverse the subflows list and associate each of them
>> with the msk memcg, and try to do the same at join completion time,
>> if
>> the msk has been already accepted.
>>
>> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming
>> connections")
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/298
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/597
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> net/mptcp/protocol.c | 36 ++++++++++++++++++++++++++----------
>> net/mptcp/protocol.h | 1 +
>> net/mptcp/subflow.c | 10 ++++++++++
>> 3 files changed, 37 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 34e6bc731085..5e9325c7ea9c 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -959,6 +959,8 @@ static bool __mptcp_finish_join(struct mptcp_sock
>> *msk, struct sock *ssk)
>> mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
>> mptcp_sockopt_sync_locked(msk, ssk);
>> mptcp_stop_tout_timer(sk);
>> + __mptcp_inherit_cgrp_data(sk, ssk);
>> + __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC);
>
> In mptcp_graph_subflows(), these two lines already have the
> mem_cgroup_from_sk() check (in the squash-to patch). It's unclear
> whether mem_cgroup_from_sk() check needs to be added here too.
From what I understood, no need to check mem_cgroup_from_sk() here:
- __mptcp_inherit_cgrp_data() should not depend on that (I think)
- __mptcp_inherit_memcg() is doing that via mem_cgroup_sockets_enabled()
I *think* the squash-to patch needs some small further modifications,
but this series looks good to me.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets
2025-11-07 21:55 [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
` (3 preceding siblings ...)
2025-11-08 15:15 ` [PATCH mptcp-net 0/3] " MPTCP CI
@ 2025-11-11 17:26 ` Matthieu Baerts
2025-11-12 9:35 ` Paolo Abeni
2025-11-12 10:10 ` Matthieu Baerts
5 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-11-11 17:26 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 07/11/2025 22:55, Paolo Abeni wrote:
> It turns out that the splat reported earlier is indeed due to an old
> bug/missing feature that hit hard on top of "mptcp: borrow forward
> memory from subflow", as we can borrow accounted memory from a socket
> not doing any accounting;)
>
> It should land upstream before the above mentioned commit, ence the
> 'net' target, but it could as well be net-next material.
Thank you for these fixes!
They look good to me for 'net':
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> I think there is still a racing window to produce the splat reported in
> #597: if a subflow receive data while the msk is locked, and before the
> memcg is initialized on such subflow, it will push the data into the msk
> becklog, and later borrow will again try to grab accounted memory from
> non accounting ssk.
>
> I have a tentative fix for that one but it's quite ugly and I'll keep it
> separate from this series.
Good idea!
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper
2025-11-07 21:55 ` [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper Paolo Abeni
2025-11-11 7:24 ` Geliang Tang
@ 2025-11-11 17:38 ` Matthieu Baerts
2025-11-12 9:04 ` Paolo Abeni
1 sibling, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-11-11 17:38 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
On 07/11/2025 22:55, Paolo Abeni wrote:
> Move out of __inet_accept() the code dealing charging newly
> accepted socket to memcg. MPTCP will soon use it to on a per
> subflow basis, in different contexts.
>
> No functional changes intended.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> include/net/sock.h | 2 ++
> net/core/sock.c | 18 ++++++++++++++++++
> net/ipv4/af_inet.c | 17 +----------------
> 3 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index a5f36ea9d46f..38d48cfe0741 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1631,6 +1631,8 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
> sk_mem_reclaim(sk);
> }
>
> +void __sk_charge(struct sock *sk, gfp_t gfp);
> +
> #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
> static inline void sk_owner_set(struct sock *sk, struct module *owner)
> {
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 3b74fc71f51c..b26a6cdc9bcd 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3448,6 +3448,24 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
> }
> EXPORT_SYMBOL(__sk_mem_reclaim);
>
> +void __sk_charge(struct sock *sk, gfp_t gfp)
> +{
> + int amt;
> +
> + gfp |= __GFP_NOFAIL;
> + if (mem_cgroup_from_sk(sk)) {
> + /* The socket has not been accepted yet, no need
> + * to look at newsk->sk_wmem_queued.
> + */
> + amt = sk_mem_pages(sk->sk_forward_alloc +
> + atomic_read(&sk->sk_rmem_alloc));
> + if (amt)
> + mem_cgroup_sk_charge(sk, amt, gfp);
> + }
> +
> + kmem_cache_charge(sk, gfp);
> +}
> +
> int sk_set_peek_off(struct sock *sk, int val)
> {
> WRITE_ONCE(sk->sk_peek_off, val);
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index a31b94ce8968..08d811f11896 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
> void __inet_accept(struct socket *sock, struct socket *newsock, struct sock *newsk)
> {
> if (mem_cgroup_sockets_enabled) {
> - gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
> -
> mem_cgroup_sk_alloc(newsk);
> -
> - if (mem_cgroup_from_sk(newsk)) {
> - int amt;
> -
> - /* The socket has not been accepted yet, no need
> - * to look at newsk->sk_wmem_queued.
> - */
> - amt = sk_mem_pages(newsk->sk_forward_alloc +
> - atomic_read(&newsk->sk_rmem_alloc));
> - if (amt)
> - mem_cgroup_sk_charge(newsk, amt, gfp);
> - }
> -
> - kmem_cache_charge(newsk, gfp);
Mmh, this code has been moved from inet_csk_accept() to __inet_accept()
in net-next only, see commit 4a997d49d92a ("tcp: Save lock_sock() for
memcg in inet_csk_accept()."):
https://lore.kernel.org/20251014235604.3057003-2-kuniyu@google.com
Should we only apply these patches in our export branch, for net-next?
If yes, then I guess we should remove the Fixes tag in patch 3/3.
> + __sk_charge(newsk, GFP_KERNEL);
> }
>
> sock_rps_record_flow(newsk);
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper
2025-11-11 17:38 ` Matthieu Baerts
@ 2025-11-12 9:04 ` Paolo Abeni
2025-11-12 9:16 ` Matthieu Baerts
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Abeni @ 2025-11-12 9:04 UTC (permalink / raw)
To: Matthieu Baerts; +Cc: mptcp
On 11/11/25 6:38 PM, Matthieu Baerts wrote:
> On 07/11/2025 22:55, Paolo Abeni wrote:
>> Move out of __inet_accept() the code dealing charging newly
>> accepted socket to memcg. MPTCP will soon use it to on a per
>> subflow basis, in different contexts.
>>
>> No functional changes intended.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> include/net/sock.h | 2 ++
>> net/core/sock.c | 18 ++++++++++++++++++
>> net/ipv4/af_inet.c | 17 +----------------
>> 3 files changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index a5f36ea9d46f..38d48cfe0741 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1631,6 +1631,8 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
>> sk_mem_reclaim(sk);
>> }
>>
>> +void __sk_charge(struct sock *sk, gfp_t gfp);
>> +
>> #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
>> static inline void sk_owner_set(struct sock *sk, struct module *owner)
>> {
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 3b74fc71f51c..b26a6cdc9bcd 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3448,6 +3448,24 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
>> }
>> EXPORT_SYMBOL(__sk_mem_reclaim);
>>
>> +void __sk_charge(struct sock *sk, gfp_t gfp)
>> +{
>> + int amt;
>> +
>> + gfp |= __GFP_NOFAIL;
>> + if (mem_cgroup_from_sk(sk)) {
>> + /* The socket has not been accepted yet, no need
>> + * to look at newsk->sk_wmem_queued.
>> + */
>> + amt = sk_mem_pages(sk->sk_forward_alloc +
>> + atomic_read(&sk->sk_rmem_alloc));
>> + if (amt)
>> + mem_cgroup_sk_charge(sk, amt, gfp);
>> + }
>> +
>> + kmem_cache_charge(sk, gfp);
>> +}
>> +
>> int sk_set_peek_off(struct sock *sk, int val)
>> {
>> WRITE_ONCE(sk->sk_peek_off, val);
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index a31b94ce8968..08d811f11896 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
>> void __inet_accept(struct socket *sock, struct socket *newsock, struct sock *newsk)
>> {
>> if (mem_cgroup_sockets_enabled) {
>> - gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>> -
>> mem_cgroup_sk_alloc(newsk);
>> -
>> - if (mem_cgroup_from_sk(newsk)) {
>> - int amt;
>> -
>> - /* The socket has not been accepted yet, no need
>> - * to look at newsk->sk_wmem_queued.
>> - */
>> - amt = sk_mem_pages(newsk->sk_forward_alloc +
>> - atomic_read(&newsk->sk_rmem_alloc));
>> - if (amt)
>> - mem_cgroup_sk_charge(newsk, amt, gfp);
>> - }
>> -
>> - kmem_cache_charge(newsk, gfp);
>
> Mmh, this code has been moved from inet_csk_accept() to __inet_accept()
> in net-next only, see commit 4a997d49d92a ("tcp: Save lock_sock() for
> memcg in inet_csk_accept()."):
>
> https://lore.kernel.org/20251014235604.3057003-2-kuniyu@google.com
Right you are!
> Should we only apply these patches in our export branch, for net-next?
> If yes, then I guess we should remove the Fixes tag in patch 3/3.
Technically is a fix. Even prior to the backlog introduction bad thing
could happen, as passive msk does not have memory by the subflow only
accounted.
TCP-level OoO could potentially use a lot of system memory that will not
be memaccounted.
Unfortunately it's not easy to have a clean net patch.
I suppose with can have this as net-next fixes (including the fixes tag
in 3/3) as the change is invasive and the thing is broken since the
beginning.
WDYT?
/P
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets
2025-11-11 17:25 ` Matthieu Baerts
@ 2025-11-12 9:10 ` Paolo Abeni
0 siblings, 0 replies; 19+ messages in thread
From: Paolo Abeni @ 2025-11-12 9:10 UTC (permalink / raw)
To: Matthieu Baerts, Geliang Tang; +Cc: mptcp
On 11/11/25 6:25 PM, Matthieu Baerts wrote:
> On 11/11/2025 09:37, Geliang Tang wrote:
>> Hi Paolo,
>>
>> Thanks for this fix.
>>
>> On Fri, 2025-11-07 at 22:55 +0100, Paolo Abeni wrote:
>>> The passive sockets never got proper memcg accounting: the msk
>>> socket is associated with the memcg at accept time, but the
>>> passive subflows never got it right.
>>>
>>> At accept time, traverse the subflows list and associate each of them
>>> with the msk memcg, and try to do the same at join completion time,
>>> if
>>> the msk has been already accepted.
>>>
>>> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming
>>> connections")
>>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/298
>>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/597
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> net/mptcp/protocol.c | 36 ++++++++++++++++++++++++++----------
>>> net/mptcp/protocol.h | 1 +
>>> net/mptcp/subflow.c | 10 ++++++++++
>>> 3 files changed, 37 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 34e6bc731085..5e9325c7ea9c 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -959,6 +959,8 @@ static bool __mptcp_finish_join(struct mptcp_sock
>>> *msk, struct sock *ssk)
>>> mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
>>> mptcp_sockopt_sync_locked(msk, ssk);
>>> mptcp_stop_tout_timer(sk);
>>> + __mptcp_inherit_cgrp_data(sk, ssk);
>>> + __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC);
>>
>> In mptcp_graph_subflows(), these two lines already have the
>> mem_cgroup_from_sk() check (in the squash-to patch). It's unclear
>> whether mem_cgroup_from_sk() check needs to be added here too.
>
> From what I understood, no need to check mem_cgroup_from_sk() here:
>
> - __mptcp_inherit_cgrp_data() should not depend on that (I think)
> - __mptcp_inherit_memcg() is doing that via mem_cgroup_sockets_enabled()
The mem_cgroup_sockets_enabled() check is optional, mainly used a
performance optimization, as it will cut the `than` branch via static
branch.
To keep the code small I chose to avoid the optimization here, but still
kept it in mptcp_graph_subflows() because there is much more stuff to be
avoided (a whole loop with per subflow lock pair).
/P
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper
2025-11-12 9:04 ` Paolo Abeni
@ 2025-11-12 9:16 ` Matthieu Baerts
2025-11-12 9:28 ` Paolo Abeni
0 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-11-12 9:16 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
Thank you for your reply!
On 12/11/2025 10:04, Paolo Abeni wrote:
>
> On 11/11/25 6:38 PM, Matthieu Baerts wrote:
>> On 07/11/2025 22:55, Paolo Abeni wrote:
>>> Move out of __inet_accept() the code dealing charging newly
>>> accepted socket to memcg. MPTCP will soon use it to on a per
>>> subflow basis, in different contexts.
>>>
>>> No functional changes intended.
(...)
>>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>>> index a31b94ce8968..08d811f11896 100644
>>> --- a/net/ipv4/af_inet.c
>>> +++ b/net/ipv4/af_inet.c
>>> @@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
>>> void __inet_accept(struct socket *sock, struct socket *newsock, struct sock *newsk)
>>> {
>>> if (mem_cgroup_sockets_enabled) {
>>> - gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>>> -
>>> mem_cgroup_sk_alloc(newsk);
>>> -
>>> - if (mem_cgroup_from_sk(newsk)) {
>>> - int amt;
>>> -
>>> - /* The socket has not been accepted yet, no need
>>> - * to look at newsk->sk_wmem_queued.
>>> - */
>>> - amt = sk_mem_pages(newsk->sk_forward_alloc +
>>> - atomic_read(&newsk->sk_rmem_alloc));
>>> - if (amt)
>>> - mem_cgroup_sk_charge(newsk, amt, gfp);
>>> - }
>>> -
>>> - kmem_cache_charge(newsk, gfp);
>>
>> Mmh, this code has been moved from inet_csk_accept() to __inet_accept()
>> in net-next only, see commit 4a997d49d92a ("tcp: Save lock_sock() for
>> memcg in inet_csk_accept()."):
>>
>> https://lore.kernel.org/20251014235604.3057003-2-kuniyu@google.com
>
> Right you are!
>
>> Should we only apply these patches in our export branch, for net-next?
>> If yes, then I guess we should remove the Fixes tag in patch 3/3.
>
> Technically is a fix. Even prior to the backlog introduction bad thing
> could happen, as passive msk does not have memory by the subflow only
> accounted.
>
> TCP-level OoO could potentially use a lot of system memory that will not
> be memaccounted.
>
> Unfortunately it's not easy to have a clean net patch.
>
> I suppose with can have this as net-next fixes (including the fixes tag
> in 3/3) as the change is invasive and the thing is broken since the
> beginning.
>
> WDYT?
It sounds good to me. When these patches will be in Linus tree, we can
ask to backport these 3 patches from this series, plus commit
4a997d49d92a ("tcp: Save lock_sock() for memcg in inet_csk_accept()."), no?
If yes, do you think I can add a "Cc: stable", and explaining in the
comments that it is because it depends on a patch that is only in
net-next, and the fix is not urgent?
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper
2025-11-12 9:16 ` Matthieu Baerts
@ 2025-11-12 9:28 ` Paolo Abeni
2025-11-12 9:55 ` Matthieu Baerts
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Abeni @ 2025-11-12 9:28 UTC (permalink / raw)
To: Matthieu Baerts; +Cc: mptcp
On 11/12/25 10:16 AM, Matthieu Baerts wrote:
> On 12/11/2025 10:04, Paolo Abeni wrote:
>>
>> On 11/11/25 6:38 PM, Matthieu Baerts wrote:
>>> On 07/11/2025 22:55, Paolo Abeni wrote:
>>>> Move out of __inet_accept() the code dealing charging newly
>>>> accepted socket to memcg. MPTCP will soon use it to on a per
>>>> subflow basis, in different contexts.
>>>>
>>>> No functional changes intended.
>
> (...)
>
>>>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>>>> index a31b94ce8968..08d811f11896 100644
>>>> --- a/net/ipv4/af_inet.c
>>>> +++ b/net/ipv4/af_inet.c
>>>> @@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
>>>> void __inet_accept(struct socket *sock, struct socket *newsock, struct sock *newsk)
>>>> {
>>>> if (mem_cgroup_sockets_enabled) {
>>>> - gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>>>> -
>>>> mem_cgroup_sk_alloc(newsk);
>>>> -
>>>> - if (mem_cgroup_from_sk(newsk)) {
>>>> - int amt;
>>>> -
>>>> - /* The socket has not been accepted yet, no need
>>>> - * to look at newsk->sk_wmem_queued.
>>>> - */
>>>> - amt = sk_mem_pages(newsk->sk_forward_alloc +
>>>> - atomic_read(&newsk->sk_rmem_alloc));
>>>> - if (amt)
>>>> - mem_cgroup_sk_charge(newsk, amt, gfp);
>>>> - }
>>>> -
>>>> - kmem_cache_charge(newsk, gfp);
>>>
>>> Mmh, this code has been moved from inet_csk_accept() to __inet_accept()
>>> in net-next only, see commit 4a997d49d92a ("tcp: Save lock_sock() for
>>> memcg in inet_csk_accept()."):
>>>
>>> https://lore.kernel.org/20251014235604.3057003-2-kuniyu@google.com
>>
>> Right you are!
>>
>>> Should we only apply these patches in our export branch, for net-next?
>>> If yes, then I guess we should remove the Fixes tag in patch 3/3.
>>
>> Technically is a fix. Even prior to the backlog introduction bad thing
>> could happen, as passive msk does not have memory by the subflow only
>> accounted.
>>
>> TCP-level OoO could potentially use a lot of system memory that will not
>> be memaccounted.
>>
>> Unfortunately it's not easy to have a clean net patch.
>>
>> I suppose with can have this as net-next fixes (including the fixes tag
>> in 3/3) as the change is invasive and the thing is broken since the
>> beginning.
>>
>> WDYT?
>
> It sounds good to me. When these patches will be in Linus tree, we can
> ask to backport these 3 patches from this series, plus commit
> 4a997d49d92a ("tcp: Save lock_sock() for memcg in inet_csk_accept()."), no?
>
> If yes, do you think I can add a "Cc: stable", and explaining in the
> comments that it is because it depends on a patch that is only in
> net-next, and the fix is not urgent?
You know I don't have all the fascination for stable;) My first take
would be to avoid such tag. If you are willing to do the extra mile I
guess that is a good plan. Alternatively you could just send the patch
to stable after merge, with pre-req included.
/P
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets
2025-11-11 17:26 ` Matthieu Baerts
@ 2025-11-12 9:35 ` Paolo Abeni
2025-11-12 9:58 ` Matthieu Baerts
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Abeni @ 2025-11-12 9:35 UTC (permalink / raw)
To: Matthieu Baerts, mptcp
On 11/11/25 6:26 PM, Matthieu Baerts wrote:
> Hi Paolo,
>
> On 07/11/2025 22:55, Paolo Abeni wrote:
>> It turns out that the splat reported earlier is indeed due to an old
>> bug/missing feature that hit hard on top of "mptcp: borrow forward
>> memory from subflow", as we can borrow accounted memory from a socket
>> not doing any accounting;)
>>
>> It should land upstream before the above mentioned commit, ence the
>> 'net' target, but it could as well be net-next material.
>
> Thank you for these fixes!
>
> They look good to me for 'net':
>
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
If you don't have applied these patches yet, please wait a bit: I'll
send a new revision including a minor delta to make 'Squash-to: "mptcp:
leverage the backlog for RX packet processing"' smaller/less ugly.
Thanks!
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper
2025-11-12 9:28 ` Paolo Abeni
@ 2025-11-12 9:55 ` Matthieu Baerts
0 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2025-11-12 9:55 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
On 12/11/2025 10:28, Paolo Abeni wrote:
> On 11/12/25 10:16 AM, Matthieu Baerts wrote:
>> On 12/11/2025 10:04, Paolo Abeni wrote:
>>>
>>> On 11/11/25 6:38 PM, Matthieu Baerts wrote:
>>>> On 07/11/2025 22:55, Paolo Abeni wrote:
>>>>> Move out of __inet_accept() the code dealing charging newly
>>>>> accepted socket to memcg. MPTCP will soon use it to on a per
>>>>> subflow basis, in different contexts.
>>>>>
>>>>> No functional changes intended.
>>
>> (...)
>>
>>>>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>>>>> index a31b94ce8968..08d811f11896 100644
>>>>> --- a/net/ipv4/af_inet.c
>>>>> +++ b/net/ipv4/af_inet.c
>>>>> @@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
>>>>> void __inet_accept(struct socket *sock, struct socket *newsock, struct sock *newsk)
>>>>> {
>>>>> if (mem_cgroup_sockets_enabled) {
>>>>> - gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>>>>> -
>>>>> mem_cgroup_sk_alloc(newsk);
>>>>> -
>>>>> - if (mem_cgroup_from_sk(newsk)) {
>>>>> - int amt;
>>>>> -
>>>>> - /* The socket has not been accepted yet, no need
>>>>> - * to look at newsk->sk_wmem_queued.
>>>>> - */
>>>>> - amt = sk_mem_pages(newsk->sk_forward_alloc +
>>>>> - atomic_read(&newsk->sk_rmem_alloc));
>>>>> - if (amt)
>>>>> - mem_cgroup_sk_charge(newsk, amt, gfp);
>>>>> - }
>>>>> -
>>>>> - kmem_cache_charge(newsk, gfp);
>>>>
>>>> Mmh, this code has been moved from inet_csk_accept() to __inet_accept()
>>>> in net-next only, see commit 4a997d49d92a ("tcp: Save lock_sock() for
>>>> memcg in inet_csk_accept()."):
>>>>
>>>> https://lore.kernel.org/20251014235604.3057003-2-kuniyu@google.com
>>>
>>> Right you are!
>>>
>>>> Should we only apply these patches in our export branch, for net-next?
>>>> If yes, then I guess we should remove the Fixes tag in patch 3/3.
>>>
>>> Technically is a fix. Even prior to the backlog introduction bad thing
>>> could happen, as passive msk does not have memory by the subflow only
>>> accounted.
>>>
>>> TCP-level OoO could potentially use a lot of system memory that will not
>>> be memaccounted.
>>>
>>> Unfortunately it's not easy to have a clean net patch.
>>>
>>> I suppose with can have this as net-next fixes (including the fixes tag
>>> in 3/3) as the change is invasive and the thing is broken since the
>>> beginning.
>>>
>>> WDYT?
>>
>> It sounds good to me. When these patches will be in Linus tree, we can
>> ask to backport these 3 patches from this series, plus commit
>> 4a997d49d92a ("tcp: Save lock_sock() for memcg in inet_csk_accept()."), no?
>>
>> If yes, do you think I can add a "Cc: stable", and explaining in the
>> comments that it is because it depends on a patch that is only in
>> net-next, and the fix is not urgent?
>
> You know I don't have all the fascination for stable;)
I forgot :)
> My first take
> would be to avoid such tag. If you are willing to do the extra mile I
> guess that is a good plan. Alternatively you could just send the patch
> to stable after merge, with pre-req included.
Adding the "Cc: stable" allow me to forget about backporting it, except
if there are conflicts! But I can add a reminder :)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets
2025-11-12 9:35 ` Paolo Abeni
@ 2025-11-12 9:58 ` Matthieu Baerts
0 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2025-11-12 9:58 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
On 12/11/2025 10:35, Paolo Abeni wrote:
> On 11/11/25 6:26 PM, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 07/11/2025 22:55, Paolo Abeni wrote:
>>> It turns out that the splat reported earlier is indeed due to an old
>>> bug/missing feature that hit hard on top of "mptcp: borrow forward
>>> memory from subflow", as we can borrow accounted memory from a socket
>>> not doing any accounting;)
>>>
>>> It should land upstream before the above mentioned commit, ence the
>>> 'net' target, but it could as well be net-next material.
>>
>> Thank you for these fixes!
>>
>> They look good to me for 'net':
>>
>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>
> If you don't have applied these patches yet, please wait a bit: I'll
> send a new revision including a minor delta to make 'Squash-to: "mptcp:
> leverage the backlog for RX packet processing"' smaller/less ugly.
I was going to apply them to help you not to have to carry many patches,
but I can sure wait! :)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets
2025-11-07 21:55 [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
` (4 preceding siblings ...)
2025-11-11 17:26 ` Matthieu Baerts
@ 2025-11-12 10:10 ` Matthieu Baerts
5 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2025-11-12 10:10 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
On 07/11/2025 22:55, Paolo Abeni wrote:
> It turns out that the splat reported earlier is indeed due to an old
> bug/missing feature that hit hard on top of "mptcp: borrow forward
> memory from subflow", as we can borrow accounted memory from a socket
> not doing any accounting;)
>
> It should land upstream before the above mentioned commit, ence the
> 'net' target, but it could as well be net-next material.
>
> I think there is still a racing window to produce the splat reported in
> #597: if a subflow receive data while the msk is locked, and before the
> memcg is initialized on such subflow, it will push the data into the msk
> becklog, and later borrow will again try to grab accounted memory from
> non accounting ssk.
>
> I have a tentative fix for that one but it's quite ugly and I'll keep it
> separate from this series.
As discussed on IRC, it is fine to already apply this series.
Now in our tree, feat. for net-next, before "mptcp: cleanup fallback
data fin reception".
New patches for t/upstream:
- d4e154d9235e: net: factor-out _sk_charge() helper
(Note: I'm going to change the RvB tags to Acked-by: non MPTCP change)
- c39f3876137b: mptcp: factor-out cgroup data inherit helper
- ce0882b539e6: mptcp: fix memcg accounting for passive sockets
- Results: 83290f9ea1dc..8ecdad1578d9 (export)
Tests are now in progress:
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/ddd5885869f67ae638bb3c8db14dfc80a1fe2bfd/checks
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-11-12 10:10 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-07 21:55 [PATCH mptcp-net 0/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
2025-11-07 21:55 ` [PATCH mptcp-net 1/3] net: factor-out _sk_charge() helper Paolo Abeni
2025-11-11 7:24 ` Geliang Tang
2025-11-11 17:38 ` Matthieu Baerts
2025-11-12 9:04 ` Paolo Abeni
2025-11-12 9:16 ` Matthieu Baerts
2025-11-12 9:28 ` Paolo Abeni
2025-11-12 9:55 ` Matthieu Baerts
2025-11-07 21:55 ` [PATCH mptcp-net 2/3] mptcp: factor-out cgroup data inherit helper Paolo Abeni
2025-11-11 7:24 ` Geliang Tang
2025-11-07 21:55 ` [PATCH mptcp-net 3/3] mptcp: fix memcg accounting for passive sockets Paolo Abeni
2025-11-11 8:37 ` Geliang Tang
2025-11-11 17:25 ` Matthieu Baerts
2025-11-12 9:10 ` Paolo Abeni
2025-11-08 15:15 ` [PATCH mptcp-net 0/3] " MPTCP CI
2025-11-11 17:26 ` Matthieu Baerts
2025-11-12 9:35 ` Paolo Abeni
2025-11-12 9:58 ` Matthieu Baerts
2025-11-12 10:10 ` Matthieu Baerts
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox