cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG.
@ 2025-08-14 20:08 Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n Kuniyuki Iwashima
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

This series converts most sk->sk_memcg access to helper functions
under CONFIG_MEMCG and finally defines sk_memcg under CONFIG_MEMCG.

This is v4 of the series linked below but without core changes
that decoupled memcg and global socket memory accounting.

I will defer the changes to a follow-up series that will use BPF
to store a flag in sk->sk_memcg.


Overview of the series:

  patch 1 is a bug fix for MPTCP
  patch 2 ~ 9 move sk->sk_memcg accesses to a single place
  patch 10 moves sk_memcg under CONFIG_MEMCG


Changes:
  v4:
    * Patch 1
      * Use set_active_memcg()

  v3: https://lore.kernel.org/netdev/20250812175848.512446-1-kuniyu@google.com/
    * Patch 12
      * Fix build failrue for kTLS (include <net/proto_memory.h>)

  v2: https://lore.kernel.org/netdev/20250811173116.2829786-1-kuniyu@google.com/
    * Remove per-memcg knob
    * Patch 11
      * Set flag on sk_memcg based on memory.max
    * Patch 12
      * Add sk_should_enter_memory_pressure() and cover
        tcp_enter_memory_pressure() calls
      * Update examples in changelog

  v1: https://lore.kernel.org/netdev/20250721203624.3807041-1-kuniyu@google.com/


Kuniyuki Iwashima (10):
  mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready().
  tcp: Simplify error path in inet_csk_accept().
  net: Call trace_sock_exceed_buf_limit() for memcg failure with
    SK_MEM_RECV.
  net: Clean up __sk_mem_raise_allocated().
  net-memcg: Introduce mem_cgroup_from_sk().
  net-memcg: Introduce mem_cgroup_sk_enabled().
  net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge().
  net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
  net: Define sk_memcg under CONFIG_MEMCG.

 include/linux/memcontrol.h      | 39 ++++++++++++++--------------
 include/net/proto_memory.h      |  4 +--
 include/net/sock.h              | 46 +++++++++++++++++++++++++++++++++
 include/net/tcp.h               |  4 +--
 mm/memcontrol.c                 | 29 ++++++++++++++-------
 net/core/sock.c                 | 38 ++++++++++++++-------------
 net/ipv4/inet_connection_sock.c | 19 +++++++-------
 net/ipv4/tcp_output.c           |  5 ++--
 net/mptcp/protocol.h            |  4 +--
 net/mptcp/subflow.c             | 11 +++-----
 10 files changed, 124 insertions(+), 75 deletions(-)

-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 21:44   ` Shakeel Butt
  2025-08-14 20:08 ` [PATCH v4 net-next 02/10] mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready() Kuniyuki Iwashima
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
sk->sk_memcg based on the current task.

MPTCP subflow socket creation is triggered from userspace or
an in-kernel worker.

In the latter case, sk->sk_memcg is not what we want.  So, we fix
it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().

Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
under #ifdef CONFIG_SOCK_CGROUP_DATA.

The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
correctly.

Let's wrap sock_create_kern() for subflow with set_active_memcg()
using the parent sk->sk_memcg.

Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 mm/memcontrol.c     |  5 ++++-
 net/mptcp/subflow.c | 11 +++--------
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8dd7fbed5a94..450862e7fd7a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 	if (!in_task())
 		return;
 
+	memcg = current->active_memcg;
+
 	rcu_read_lock();
-	memcg = mem_cgroup_from_task(current);
+	if (likely(!memcg))
+		memcg = mem_cgroup_from_task(current);
 	if (mem_cgroup_is_root(memcg))
 		goto out;
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg1_tcpmem_active(memcg))
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 3f1b62a9fe88..a4809054ea6c 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1717,14 +1717,6 @@ static void mptcp_attach_cgroup(struct sock *parent, struct sock *child)
 	/* only the additional subflows created by kworkers have to be modified */
 	if (cgroup_id(sock_cgroup_ptr(parent_skcd)) !=
 	    cgroup_id(sock_cgroup_ptr(child_skcd))) {
-#ifdef CONFIG_MEMCG
-		struct mem_cgroup *memcg = parent->sk_memcg;
-
-		mem_cgroup_sk_free(child);
-		if (memcg && css_tryget(&memcg->css))
-			child->sk_memcg = memcg;
-#endif /* CONFIG_MEMCG */
-
 		cgroup_sk_free(child_skcd);
 		*child_skcd = *parent_skcd;
 		cgroup_sk_clone(child_skcd);
@@ -1757,6 +1749,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 {
 	struct mptcp_subflow_context *subflow;
 	struct net *net = sock_net(sk);
+	struct mem_cgroup *memcg;
 	struct socket *sf;
 	int err;
 
@@ -1766,7 +1759,9 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 	if (unlikely(!sk->sk_socket))
 		return -EINVAL;
 
+	memcg = set_active_memcg(sk->sk_memcg);
 	err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
+	set_active_memcg(memcg);
 	if (err)
 		return err;
 
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 02/10] mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready().
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 03/10] tcp: Simplify error path in inet_csk_accept() Kuniyuki Iwashima
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

Some conditions used in mptcp_epollin_ready() are the same as
tcp_under_memory_pressure().

We will modify tcp_under_memory_pressure() in the later patch.

Let's use tcp_under_memory_pressure() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 net/mptcp/protocol.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index b15d7fab5c4b..a1787a1344ac 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -788,9 +788,7 @@ static inline bool mptcp_epollin_ready(const struct sock *sk)
 	 * as it can always coalesce them
 	 */
 	return (data_avail >= sk->sk_rcvlowat) ||
-	       (mem_cgroup_sockets_enabled && sk->sk_memcg &&
-		mem_cgroup_under_socket_pressure(sk->sk_memcg)) ||
-	       READ_ONCE(tcp_memory_pressure);
+		tcp_under_memory_pressure(sk);
 }
 
 int mptcp_set_rcvlowat(struct sock *sk, int val);
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 03/10] tcp: Simplify error path in inet_csk_accept().
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 02/10] mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready() Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 04/10] net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV Kuniyuki Iwashima
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

When an error occurs in inet_csk_accept(), what we should do is
only call release_sock() and set the errno to arg->err.

But the path jumps to another label, which introduces unnecessary
initialisation and tests for newsk.

Let's simplify the error path and remove the redundant NULL
checks for newsk.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_connection_sock.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 1e2df51427fe..724bd9ed6cd4 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -706,9 +706,9 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
 		spin_unlock_bh(&queue->fastopenq.lock);
 	}
 
-out:
 	release_sock(sk);
-	if (newsk && mem_cgroup_sockets_enabled) {
+
+	if (mem_cgroup_sockets_enabled) {
 		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
 		int amt = 0;
 
@@ -732,18 +732,17 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
 
 		release_sock(newsk);
 	}
+
 	if (req)
 		reqsk_put(req);
 
-	if (newsk)
-		inet_init_csk_locks(newsk);
-
+	inet_init_csk_locks(newsk);
 	return newsk;
+
 out_err:
-	newsk = NULL;
-	req = NULL;
+	release_sock(sk);
 	arg->err = error;
-	goto out;
+	return NULL;
 }
 EXPORT_SYMBOL(inet_csk_accept);
 
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 04/10] net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV.
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
                   ` (2 preceding siblings ...)
  2025-08-14 20:08 ` [PATCH v4 net-next 03/10] tcp: Simplify error path in inet_csk_accept() Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 05/10] net: Clean up __sk_mem_raise_allocated() Kuniyuki Iwashima
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

Initially, trace_sock_exceed_buf_limit() was invoked when
__sk_mem_raise_allocated() failed due to the memcg limit or the
global limit.

However, commit d6f19938eb031 ("net: expose sk wmem in
sock_exceed_buf_limit tracepoint") somehow suppressed the event
only when memcg failed to charge for SK_MEM_RECV, although the
memcg failure for SK_MEM_SEND still triggers the event.

Let's restore the event for SK_MEM_RECV.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 net/core/sock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7c26ec8dce63..380bc1aa6982 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3354,8 +3354,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 		}
 	}
 
-	if (kind == SK_MEM_SEND || (kind == SK_MEM_RECV && charged))
-		trace_sock_exceed_buf_limit(sk, prot, allocated, kind);
+	trace_sock_exceed_buf_limit(sk, prot, allocated, kind);
 
 	sk_memory_allocated_sub(sk, amt);
 
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 05/10] net: Clean up __sk_mem_raise_allocated().
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
                   ` (3 preceding siblings ...)
  2025-08-14 20:08 ` [PATCH v4 net-next 04/10] net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 06/10] net-memcg: Introduce mem_cgroup_from_sk() Kuniyuki Iwashima
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

In __sk_mem_raise_allocated(), charged is initialised as true due
to the weird condition removed in the previous patch.

It makes the variable unreliable by itself, so we have to check
another variable, memcg, in advance.

Also, we will factorise the common check below for memcg later.

    if (mem_cgroup_sockets_enabled && sk->sk_memcg)

As a prep, let's initialise charged as false and memcg as NULL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 net/core/sock.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 380bc1aa6982..000940ecf360 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3263,15 +3263,16 @@ EXPORT_SYMBOL(sk_wait_data);
  */
 int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 {
-	struct mem_cgroup *memcg = mem_cgroup_sockets_enabled ? sk->sk_memcg : NULL;
 	struct proto *prot = sk->sk_prot;
-	bool charged = true;
+	struct mem_cgroup *memcg = NULL;
+	bool charged = false;
 	long allocated;
 
 	sk_memory_allocated_add(sk, amt);
 	allocated = sk_memory_allocated(sk);
 
-	if (memcg) {
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg) {
+		memcg = sk->sk_memcg;
 		charged = mem_cgroup_charge_skmem(memcg, amt, gfp_memcg_charge());
 		if (!charged)
 			goto suppress_allocation;
@@ -3358,7 +3359,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 
 	sk_memory_allocated_sub(sk, amt);
 
-	if (memcg && charged)
+	if (charged)
 		mem_cgroup_uncharge_skmem(memcg, amt);
 
 	return 0;
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 06/10] net-memcg: Introduce mem_cgroup_from_sk().
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
                   ` (4 preceding siblings ...)
  2025-08-14 20:08 ` [PATCH v4 net-next 05/10] net: Clean up __sk_mem_raise_allocated() Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 21:51   ` Shakeel Butt
  2025-08-14 20:08 ` [PATCH v4 net-next 07/10] net-memcg: Introduce mem_cgroup_sk_enabled() Kuniyuki Iwashima
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

We will store a flag in the lowest bit of sk->sk_memcg.

Then, directly dereferencing sk->sk_memcg will be illegal, and we
do not want to allow touching the raw sk->sk_memcg in many places.

Let's introduce mem_cgroup_from_sk().

Other places accessing the raw sk->sk_memcg will be converted later.

Note that we cannot define the helper as an inline function in
memcontrol.h as we cannot access any fields of struct sock there
due to circular dependency, so it is placed in sock.h.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/net/sock.h              | 12 ++++++++++++
 mm/memcontrol.c                 |  6 ++++--
 net/ipv4/inet_connection_sock.c |  2 +-
 net/mptcp/subflow.c             |  2 +-
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index c8a4b283df6f..811f95ea8d00 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2594,6 +2594,18 @@ static inline gfp_t gfp_memcg_charge(void)
 	return in_softirq() ? GFP_ATOMIC : GFP_KERNEL;
 }
 
+#ifdef CONFIG_MEMCG
+static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk)
+{
+	return sk->sk_memcg;
+}
+#else
+static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk)
+{
+	return NULL;
+}
+#endif
+
 static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
 {
 	return noblock ? 0 : READ_ONCE(sk->sk_rcvtimeo);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 450862e7fd7a..1717c3a50f66 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5023,8 +5023,10 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 
 void mem_cgroup_sk_free(struct sock *sk)
 {
-	if (sk->sk_memcg)
-		css_put(&sk->sk_memcg->css);
+	struct mem_cgroup *memcg = mem_cgroup_from_sk(sk);
+
+	if (memcg)
+		css_put(&memcg->css);
 }
 
 /**
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 724bd9ed6cd4..93569bbe00f4 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -718,7 +718,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
 		lock_sock(newsk);
 
 		mem_cgroup_sk_alloc(newsk);
-		if (newsk->sk_memcg) {
+		if (mem_cgroup_from_sk(newsk)) {
 			/* The socket has not been accepted yet, no need
 			 * to look at newsk->sk_wmem_queued.
 			 */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index a4809054ea6c..70c45c092d13 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1759,7 +1759,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 	if (unlikely(!sk->sk_socket))
 		return -EINVAL;
 
-	memcg = set_active_memcg(sk->sk_memcg);
+	memcg = set_active_memcg(mem_cgroup_from_sk(sk));
 	err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
 	set_active_memcg(memcg);
 	if (err)
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 07/10] net-memcg: Introduce mem_cgroup_sk_enabled().
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
                   ` (5 preceding siblings ...)
  2025-08-14 20:08 ` [PATCH v4 net-next 06/10] net-memcg: Introduce mem_cgroup_from_sk() Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 21:51   ` Shakeel Butt
  2025-08-14 20:08 ` [PATCH v4 net-next 08/10] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge() Kuniyuki Iwashima
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

The socket memcg feature is enabled by a static key and
only works for non-root cgroup.

We check both conditions in many places.

Let's factorise it as a helper function.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/net/proto_memory.h |  2 +-
 include/net/sock.h         | 10 ++++++++++
 include/net/tcp.h          |  2 +-
 net/core/sock.c            |  6 +++---
 net/ipv4/tcp_output.c      |  2 +-
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/net/proto_memory.h b/include/net/proto_memory.h
index a6ab2f4f5e28..859e63de81c4 100644
--- a/include/net/proto_memory.h
+++ b/include/net/proto_memory.h
@@ -31,7 +31,7 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 	if (!sk->sk_prot->memory_pressure)
 		return false;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	if (mem_cgroup_sk_enabled(sk) &&
 	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 811f95ea8d00..3efdf680401d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2599,11 +2599,21 @@ static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk)
 {
 	return sk->sk_memcg;
 }
+
+static inline bool mem_cgroup_sk_enabled(const struct sock *sk)
+{
+	return mem_cgroup_sockets_enabled && mem_cgroup_from_sk(sk);
+}
 #else
 static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk)
 {
 	return NULL;
 }
+
+static inline bool mem_cgroup_sk_enabled(const struct sock *sk)
+{
+	return false;
+}
 #endif
 
 static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 526a26e7a150..9f01b6be6444 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -275,7 +275,7 @@ extern unsigned long tcp_memory_pressure;
 /* optimized version of sk_under_memory_pressure() for TCP sockets */
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	if (mem_cgroup_sk_enabled(sk) &&
 	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 000940ecf360..ab658fe23e1e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1032,7 +1032,7 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
 	bool charged;
 	int pages;
 
-	if (!mem_cgroup_sockets_enabled || !sk->sk_memcg || !sk_has_account(sk))
+	if (!mem_cgroup_sk_enabled(sk) || !sk_has_account(sk))
 		return -EOPNOTSUPP;
 
 	if (!bytes)
@@ -3271,7 +3271,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 	sk_memory_allocated_add(sk, amt);
 	allocated = sk_memory_allocated(sk);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_memcg) {
+	if (mem_cgroup_sk_enabled(sk)) {
 		memcg = sk->sk_memcg;
 		charged = mem_cgroup_charge_skmem(memcg, amt, gfp_memcg_charge());
 		if (!charged)
@@ -3398,7 +3398,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount)
 {
 	sk_memory_allocated_sub(sk, amount);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+	if (mem_cgroup_sk_enabled(sk))
 		mem_cgroup_uncharge_skmem(sk->sk_memcg, amount);
 
 	if (sk_under_global_memory_pressure(sk) &&
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index caf11920a878..37fb320e6f70 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3578,7 +3578,7 @@ void sk_forced_mem_schedule(struct sock *sk, int size)
 	sk_forward_alloc_add(sk, amt << PAGE_SHIFT);
 	sk_memory_allocated_add(sk, amt);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+	if (mem_cgroup_sk_enabled(sk))
 		mem_cgroup_charge_skmem(sk->sk_memcg, amt,
 					gfp_memcg_charge() | __GFP_NOFAIL);
 }
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 08/10] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge().
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
                   ` (6 preceding siblings ...)
  2025-08-14 20:08 ` [PATCH v4 net-next 07/10] net-memcg: Introduce mem_cgroup_sk_enabled() Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 21:54   ` Shakeel Butt
  2025-08-14 20:08 ` [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure() Kuniyuki Iwashima
  2025-08-14 20:08 ` [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG Kuniyuki Iwashima
  9 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

We will store a flag in the lowest bit of sk->sk_memcg.

Then, we cannot pass the raw pointer to mem_cgroup_charge_skmem()
and mem_cgroup_uncharge_skmem().

Let's pass struct sock to the functions.

While at it, they are renamed to match other functions starting
with mem_cgroup_sk_.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h      | 29 ++++++++++++++++++++++++-----
 mm/memcontrol.c                 | 18 +++++++++++-------
 net/core/sock.c                 | 24 +++++++++++-------------
 net/ipv4/inet_connection_sock.c |  2 +-
 net/ipv4/tcp_output.c           |  3 +--
 5 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 785173aa0739..ff008a345ce7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1596,14 +1596,15 @@ static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb)
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
-bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
-			     gfp_t gfp_mask);
-void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 #ifdef CONFIG_MEMCG
 extern struct static_key_false memcg_sockets_enabled_key;
 #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
+
 void mem_cgroup_sk_alloc(struct sock *sk);
 void mem_cgroup_sk_free(struct sock *sk);
+bool mem_cgroup_sk_charge(const struct sock *sk, unsigned int nr_pages,
+			  gfp_t gfp_mask);
+void mem_cgroup_sk_uncharge(const struct sock *sk, unsigned int nr_pages);
 
 #if BITS_PER_LONG < 64
 static inline void mem_cgroup_set_socket_pressure(struct mem_cgroup *memcg)
@@ -1659,8 +1660,26 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id);
 void reparent_shrinker_deferred(struct mem_cgroup *memcg);
 #else
 #define mem_cgroup_sockets_enabled 0
-static inline void mem_cgroup_sk_alloc(struct sock *sk) { };
-static inline void mem_cgroup_sk_free(struct sock *sk) { };
+static inline void mem_cgroup_sk_alloc(struct sock *sk)
+{
+}
+
+static inline void mem_cgroup_sk_free(struct sock *sk)
+{
+}
+
+static inline bool mem_cgroup_sk_charge(const struct sock *sk,
+					unsigned int nr_pages,
+					gfp_t gfp_mask)
+{
+	return false;
+}
+
+static inline void mem_cgroup_sk_uncharge(const struct sock *sk,
+					  unsigned int nr_pages)
+{
+}
+
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
 	return false;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1717c3a50f66..02f5e574fea0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5030,17 +5030,19 @@ void mem_cgroup_sk_free(struct sock *sk)
 }
 
 /**
- * mem_cgroup_charge_skmem - charge socket memory
- * @memcg: memcg to charge
+ * mem_cgroup_sk_charge - charge socket memory
+ * @sk: socket in memcg to charge
  * @nr_pages: number of pages to charge
  * @gfp_mask: reclaim mode
  *
  * Charges @nr_pages to @memcg. Returns %true if the charge fit within
  * @memcg's configured limit, %false if it doesn't.
  */
-bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
-			     gfp_t gfp_mask)
+bool mem_cgroup_sk_charge(const struct sock *sk, unsigned int nr_pages,
+			  gfp_t gfp_mask)
 {
+	struct mem_cgroup *memcg = mem_cgroup_from_sk(sk);
+
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return memcg1_charge_skmem(memcg, nr_pages, gfp_mask);
 
@@ -5053,12 +5055,14 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
 }
 
 /**
- * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @memcg: memcg to uncharge
+ * mem_cgroup_sk_uncharge - uncharge socket memory
+ * @sk: socket in memcg to uncharge
  * @nr_pages: number of pages to uncharge
  */
-void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+void mem_cgroup_sk_uncharge(const struct sock *sk, unsigned int nr_pages)
 {
+	struct mem_cgroup *memcg = mem_cgroup_from_sk(sk);
+
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
 		memcg1_uncharge_skmem(memcg, nr_pages);
 		return;
diff --git a/net/core/sock.c b/net/core/sock.c
index ab658fe23e1e..5537ca263858 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1041,8 +1041,8 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
 	pages = sk_mem_pages(bytes);
 
 	/* pre-charge to memcg */
-	charged = mem_cgroup_charge_skmem(sk->sk_memcg, pages,
-					  GFP_KERNEL | __GFP_RETRY_MAYFAIL);
+	charged = mem_cgroup_sk_charge(sk, pages,
+				       GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!charged)
 		return -ENOMEM;
 
@@ -1054,7 +1054,7 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
 	 */
 	if (allocated > sk_prot_mem_limits(sk, 1)) {
 		sk_memory_allocated_sub(sk, pages);
-		mem_cgroup_uncharge_skmem(sk->sk_memcg, pages);
+		mem_cgroup_sk_uncharge(sk, pages);
 		return -ENOMEM;
 	}
 	sk_forward_alloc_add(sk, pages << PAGE_SHIFT);
@@ -3263,17 +3263,16 @@ EXPORT_SYMBOL(sk_wait_data);
  */
 int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 {
+	bool memcg_enabled = false, charged = false;
 	struct proto *prot = sk->sk_prot;
-	struct mem_cgroup *memcg = NULL;
-	bool charged = false;
 	long allocated;
 
 	sk_memory_allocated_add(sk, amt);
 	allocated = sk_memory_allocated(sk);
 
 	if (mem_cgroup_sk_enabled(sk)) {
-		memcg = sk->sk_memcg;
-		charged = mem_cgroup_charge_skmem(memcg, amt, gfp_memcg_charge());
+		memcg_enabled = true;
+		charged = mem_cgroup_sk_charge(sk, amt, gfp_memcg_charge());
 		if (!charged)
 			goto suppress_allocation;
 	}
@@ -3347,10 +3346,9 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 		 */
 		if (sk->sk_wmem_queued + size >= sk->sk_sndbuf) {
 			/* Force charge with __GFP_NOFAIL */
-			if (memcg && !charged) {
-				mem_cgroup_charge_skmem(memcg, amt,
-					gfp_memcg_charge() | __GFP_NOFAIL);
-			}
+			if (memcg_enabled && !charged)
+				mem_cgroup_sk_charge(sk, amt,
+						     gfp_memcg_charge() | __GFP_NOFAIL);
 			return 1;
 		}
 	}
@@ -3360,7 +3358,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 	sk_memory_allocated_sub(sk, amt);
 
 	if (charged)
-		mem_cgroup_uncharge_skmem(memcg, amt);
+		mem_cgroup_sk_uncharge(sk, amt);
 
 	return 0;
 }
@@ -3399,7 +3397,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount)
 	sk_memory_allocated_sub(sk, amount);
 
 	if (mem_cgroup_sk_enabled(sk))
-		mem_cgroup_uncharge_skmem(sk->sk_memcg, amount);
+		mem_cgroup_sk_uncharge(sk, amount);
 
 	if (sk_under_global_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 93569bbe00f4..0ef1eacd539d 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -727,7 +727,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
 		}
 
 		if (amt)
-			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
+			mem_cgroup_sk_charge(newsk, amt, gfp);
 		kmem_cache_charge(newsk, gfp);
 
 		release_sock(newsk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 37fb320e6f70..dfbac0876d96 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3579,8 +3579,7 @@ void sk_forced_mem_schedule(struct sock *sk, int size)
 	sk_memory_allocated_add(sk, amt);
 
 	if (mem_cgroup_sk_enabled(sk))
-		mem_cgroup_charge_skmem(sk->sk_memcg, amt,
-					gfp_memcg_charge() | __GFP_NOFAIL);
+		mem_cgroup_sk_charge(sk, amt, gfp_memcg_charge() | __GFP_NOFAIL);
 }
 
 /* Send a FIN. The caller locks the socket for us.
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
                   ` (7 preceding siblings ...)
  2025-08-14 20:08 ` [PATCH v4 net-next 08/10] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge() Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 22:00   ` Shakeel Butt
  2025-08-14 20:08 ` [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG Kuniyuki Iwashima
  9 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

We will store a flag in the lowest bit of sk->sk_memcg.

Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure().

Let's pass struct sock to it and rename the function to match other
functions starting with mem_cgroup_sk_.

Note that the helper is moved to sock.h to use mem_cgroup_from_sk().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h | 18 ------------------
 include/net/proto_memory.h |  2 +-
 include/net/sock.h         | 22 ++++++++++++++++++++++
 include/net/tcp.h          |  2 +-
 4 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ff008a345ce7..723cc61410ae 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1641,19 +1641,6 @@ static inline u64 mem_cgroup_get_socket_pressure(struct mem_cgroup *memcg)
 }
 #endif
 
-static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
-{
-#ifdef CONFIG_MEMCG_V1
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
-		return !!memcg->tcpmem_pressure;
-#endif /* CONFIG_MEMCG_V1 */
-	do {
-		if (time_before64(get_jiffies_64(), mem_cgroup_get_socket_pressure(memcg)))
-			return true;
-	} while ((memcg = parent_mem_cgroup(memcg)));
-	return false;
-}
-
 int alloc_shrinker_info(struct mem_cgroup *memcg);
 void free_shrinker_info(struct mem_cgroup *memcg);
 void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id);
@@ -1680,11 +1667,6 @@ static inline void mem_cgroup_sk_uncharge(const struct sock *sk,
 {
 }
 
-static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
-{
-	return false;
-}
-
 static inline void set_shrinker_bit(struct mem_cgroup *memcg,
 				    int nid, int shrinker_id)
 {
diff --git a/include/net/proto_memory.h b/include/net/proto_memory.h
index 859e63de81c4..8e91a8fa31b5 100644
--- a/include/net/proto_memory.h
+++ b/include/net/proto_memory.h
@@ -32,7 +32,7 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 		return false;
 
 	if (mem_cgroup_sk_enabled(sk) &&
-	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
+	    mem_cgroup_sk_under_memory_pressure(sk))
 		return true;
 
 	return !!READ_ONCE(*sk->sk_prot->memory_pressure);
diff --git a/include/net/sock.h b/include/net/sock.h
index 3efdf680401d..3bc4d566f7d0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2604,6 +2604,23 @@ static inline bool mem_cgroup_sk_enabled(const struct sock *sk)
 {
 	return mem_cgroup_sockets_enabled && mem_cgroup_from_sk(sk);
 }
+
+static inline bool mem_cgroup_sk_under_memory_pressure(const struct sock *sk)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_sk(sk);
+
+#ifdef CONFIG_MEMCG_V1
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		return !!memcg->tcpmem_pressure;
+#endif /* CONFIG_MEMCG_V1 */
+
+	do {
+		if (time_before64(get_jiffies_64(), mem_cgroup_get_socket_pressure(memcg)))
+			return true;
+	} while ((memcg = parent_mem_cgroup(memcg)));
+
+	return false;
+}
 #else
 static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk)
 {
@@ -2614,6 +2631,11 @@ static inline bool mem_cgroup_sk_enabled(const struct sock *sk)
 {
 	return false;
 }
+
+static inline bool mem_cgroup_sk_under_memory_pressure(const struct sock *sk)
+{
+	return false;
+}
 #endif
 
 static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9f01b6be6444..2936b8175950 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -276,7 +276,7 @@ extern unsigned long tcp_memory_pressure;
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
 {
 	if (mem_cgroup_sk_enabled(sk) &&
-	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
+	    mem_cgroup_sk_under_memory_pressure(sk))
 		return true;
 
 	return READ_ONCE(tcp_memory_pressure);
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG.
  2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
                   ` (8 preceding siblings ...)
  2025-08-14 20:08 ` [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure() Kuniyuki Iwashima
@ 2025-08-14 20:08 ` Kuniyuki Iwashima
  2025-08-14 20:21   ` Roman Gushchin
  2025-08-14 22:10   ` Shakeel Butt
  9 siblings, 2 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 20:08 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, Michal Koutný, Tejun Heo
  Cc: Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, mptcp, cgroups,
	linux-mm

Except for sk_clone_lock(), all accesses to sk->sk_memcg
is done under CONFIG_MEMCG.

As a bonus, let's define sk->sk_memcg under CONFIG_MEMCG.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h | 2 ++
 net/core/sock.c    | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 3bc4d566f7d0..1c49ea13af4a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -443,7 +443,9 @@ struct sock {
 	__cacheline_group_begin(sock_read_rxtx);
 	int			sk_err;
 	struct socket		*sk_socket;
+#ifdef CONFIG_MEMCG
 	struct mem_cgroup	*sk_memcg;
+#endif
 #ifdef CONFIG_XFRM
 	struct xfrm_policy __rcu *sk_policy[2];
 #endif
diff --git a/net/core/sock.c b/net/core/sock.c
index 5537ca263858..ab6953d295df 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2512,8 +2512,10 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 
 	sock_reset_flag(newsk, SOCK_DONE);
 
+#ifdef CONFIG_MEMCG
 	/* sk->sk_memcg will be populated at accept() time */
 	newsk->sk_memcg = NULL;
+#endif
 
 	cgroup_sk_clone(&newsk->sk_cgrp_data);
 
@@ -4452,7 +4454,9 @@ static int __init sock_struct_check(void)
 
 	CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_rxtx, sk_err);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_rxtx, sk_socket);
+#ifdef CONFIG_MEMCG
 	CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_rxtx, sk_memcg);
+#endif
 
 	CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx, sk_lock);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx, sk_reserved_mem);
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG.
  2025-08-14 20:08 ` [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG Kuniyuki Iwashima
@ 2025-08-14 20:21   ` Roman Gushchin
  2025-08-14 22:10   ` Shakeel Butt
  1 sibling, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2025-08-14 20:21 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

Kuniyuki Iwashima <kuniyu@google.com> writes:

> Except for sk_clone_lock(), all accesses to sk->sk_memcg
> is done under CONFIG_MEMCG.
>
> As a bonus, let's define sk->sk_memcg under CONFIG_MEMCG.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-14 20:08 ` [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n Kuniyuki Iwashima
@ 2025-08-14 21:44   ` Shakeel Butt
  2025-08-14 23:27     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 21:44 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> sk->sk_memcg based on the current task.
> 
> MPTCP subflow socket creation is triggered from userspace or
> an in-kernel worker.
> 
> In the latter case, sk->sk_memcg is not what we want.  So, we fix
> it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> 
> Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> under #ifdef CONFIG_SOCK_CGROUP_DATA.
> 
> The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
> CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> correctly.
> 
> Let's wrap sock_create_kern() for subflow with set_active_memcg()
> using the parent sk->sk_memcg.
> 
> Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> Suggested-by: Michal Koutný <mkoutny@suse.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> ---
>  mm/memcontrol.c     |  5 ++++-
>  net/mptcp/subflow.c | 11 +++--------
>  2 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8dd7fbed5a94..450862e7fd7a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
>  	if (!in_task())
>  		return;
>  
> +	memcg = current->active_memcg;
> +

Use active_memcg() instead of current->active_memcg and do before the
!in_task() check.

Basically something like following:

	memcg = active_memcg();
	/* Do not associate the sock with unrelated interrupted task's memcg. */
	if (!in_task() && !memcg)
		return;

>  	rcu_read_lock();
> -	memcg = mem_cgroup_from_task(current);
> +	if (likely(!memcg))
> +		memcg = mem_cgroup_from_task(current);
>  	if (mem_cgroup_is_root(memcg))
>  		goto out;
>  	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg1_tcpmem_active(memcg))
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 3f1b62a9fe88..a4809054ea6c 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1717,14 +1717,6 @@ static void mptcp_attach_cgroup(struct sock *parent, struct sock *child)
>  	/* only the additional subflows created by kworkers have to be modified */
>  	if (cgroup_id(sock_cgroup_ptr(parent_skcd)) !=
>  	    cgroup_id(sock_cgroup_ptr(child_skcd))) {
> -#ifdef CONFIG_MEMCG
> -		struct mem_cgroup *memcg = parent->sk_memcg;
> -
> -		mem_cgroup_sk_free(child);
> -		if (memcg && css_tryget(&memcg->css))
> -			child->sk_memcg = memcg;
> -#endif /* CONFIG_MEMCG */
> -
>  		cgroup_sk_free(child_skcd);
>  		*child_skcd = *parent_skcd;
>  		cgroup_sk_clone(child_skcd);
> @@ -1757,6 +1749,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>  {
>  	struct mptcp_subflow_context *subflow;
>  	struct net *net = sock_net(sk);
> +	struct mem_cgroup *memcg;
>  	struct socket *sf;
>  	int err;
>  
> @@ -1766,7 +1759,9 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>  	if (unlikely(!sk->sk_socket))
>  		return -EINVAL;
>  
> +	memcg = set_active_memcg(sk->sk_memcg);
>  	err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
> +	set_active_memcg(memcg);
>  	if (err)
>  		return err;
>  
> -- 
> 2.51.0.rc1.163.g2494970778-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 06/10] net-memcg: Introduce mem_cgroup_from_sk().
  2025-08-14 20:08 ` [PATCH v4 net-next 06/10] net-memcg: Introduce mem_cgroup_from_sk() Kuniyuki Iwashima
@ 2025-08-14 21:51   ` Shakeel Butt
  0 siblings, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 21:51 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 08:08:38PM +0000, Kuniyuki Iwashima wrote:
> We will store a flag in the lowest bit of sk->sk_memcg.
> 
> Then, directly dereferencing sk->sk_memcg will be illegal, and we
> do not want to allow touching the raw sk->sk_memcg in many places.
> 
> Let's introduce mem_cgroup_from_sk().
> 
> Other places accessing the raw sk->sk_memcg will be converted later.
> 
> Note that we cannot define the helper as an inline function in
> memcontrol.h as we cannot access any fields of struct sock there
> due to circular dependency, so it is placed in sock.h.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 07/10] net-memcg: Introduce mem_cgroup_sk_enabled().
  2025-08-14 20:08 ` [PATCH v4 net-next 07/10] net-memcg: Introduce mem_cgroup_sk_enabled() Kuniyuki Iwashima
@ 2025-08-14 21:51   ` Shakeel Butt
  0 siblings, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 21:51 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 08:08:39PM +0000, Kuniyuki Iwashima wrote:
> The socket memcg feature is enabled by a static key and
> only works for non-root cgroup.
> 
> We check both conditions in many places.
> 
> Let's factorise it as a helper function.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 08/10] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge().
  2025-08-14 20:08 ` [PATCH v4 net-next 08/10] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge() Kuniyuki Iwashima
@ 2025-08-14 21:54   ` Shakeel Butt
  0 siblings, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 21:54 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 08:08:40PM +0000, Kuniyuki Iwashima wrote:
> We will store a flag in the lowest bit of sk->sk_memcg.
> 
> Then, we cannot pass the raw pointer to mem_cgroup_charge_skmem()
> and mem_cgroup_uncharge_skmem().
> 
> Let's pass struct sock to the functions.
> 
> While at it, they are renamed to match other functions starting
> with mem_cgroup_sk_.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
  2025-08-14 20:08 ` [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure() Kuniyuki Iwashima
@ 2025-08-14 22:00   ` Shakeel Butt
  2025-08-14 22:10     ` Shakeel Butt
  0 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 22:00 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 08:08:41PM +0000, Kuniyuki Iwashima wrote:
> We will store a flag in the lowest bit of sk->sk_memcg.
> 
> Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure().
> 
> Let's pass struct sock to it and rename the function to match other
> functions starting with mem_cgroup_sk_.
> 
> Note that the helper is moved to sock.h to use mem_cgroup_from_sk().

Please keep it in the memcontrol.h.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
  2025-08-14 22:00   ` Shakeel Butt
@ 2025-08-14 22:10     ` Shakeel Butt
  2025-08-14 23:22       ` Kuniyuki Iwashima
  0 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 22:10 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 03:00:05PM -0700, Shakeel Butt wrote:
> On Thu, Aug 14, 2025 at 08:08:41PM +0000, Kuniyuki Iwashima wrote:
> > We will store a flag in the lowest bit of sk->sk_memcg.
> > 
> > Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure().
> > 
> > Let's pass struct sock to it and rename the function to match other
> > functions starting with mem_cgroup_sk_.
> > 
> > Note that the helper is moved to sock.h to use mem_cgroup_from_sk().
> 
> Please keep it in the memcontrol.h.
> 

Oh is this due to struct sock is not yet defined and thus sk->sk_memcg
will build fail?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG.
  2025-08-14 20:08 ` [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG Kuniyuki Iwashima
  2025-08-14 20:21   ` Roman Gushchin
@ 2025-08-14 22:10   ` Shakeel Butt
  1 sibling, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 22:10 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 08:08:42PM +0000, Kuniyuki Iwashima wrote:
> Except for sk_clone_lock(), all accesses to sk->sk_memcg
> is done under CONFIG_MEMCG.
> 
> As a bonus, let's define sk->sk_memcg under CONFIG_MEMCG.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
  2025-08-14 22:10     ` Shakeel Butt
@ 2025-08-14 23:22       ` Kuniyuki Iwashima
  0 siblings, 0 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 23:22 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 3:10 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Aug 14, 2025 at 03:00:05PM -0700, Shakeel Butt wrote:
> > On Thu, Aug 14, 2025 at 08:08:41PM +0000, Kuniyuki Iwashima wrote:
> > > We will store a flag in the lowest bit of sk->sk_memcg.
> > >
> > > Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure().
> > >
> > > Let's pass struct sock to it and rename the function to match other
> > > functions starting with mem_cgroup_sk_.
> > >
> > > Note that the helper is moved to sock.h to use mem_cgroup_from_sk().
> >
> > Please keep it in the memcontrol.h.
> >
>
> Oh is this due to struct sock is not yet defined and thus sk->sk_memcg
> will build fail?

Right, we can't touch any field of struct sock in memcontrol.h as
noted in patch 6.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-14 21:44   ` Shakeel Butt
@ 2025-08-14 23:27     ` Kuniyuki Iwashima
  2025-08-14 23:46       ` Shakeel Butt
  0 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-14 23:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> > When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> > sk->sk_memcg based on the current task.
> >
> > MPTCP subflow socket creation is triggered from userspace or
> > an in-kernel worker.
> >
> > In the latter case, sk->sk_memcg is not what we want.  So, we fix
> > it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> >
> > Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> > under #ifdef CONFIG_SOCK_CGROUP_DATA.
> >
> > The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
> > CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> > correctly.
> >
> > Let's wrap sock_create_kern() for subflow with set_active_memcg()
> > using the parent sk->sk_memcg.
> >
> > Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> > Suggested-by: Michal Koutný <mkoutny@suse.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > ---
> >  mm/memcontrol.c     |  5 ++++-
> >  net/mptcp/subflow.c | 11 +++--------
> >  2 files changed, 7 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 8dd7fbed5a94..450862e7fd7a 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> >       if (!in_task())
> >               return;
> >
> > +     memcg = current->active_memcg;
> > +
>
> Use active_memcg() instead of current->active_memcg and do before the
> !in_task() check.

Why not reuse the !in_task() check here ?
We never use int_active_memcg for socket and also
know int_active_memcg is always NULL here.


>
> Basically something like following:
>
>         memcg = active_memcg();
>         /* Do not associate the sock with unrelated interrupted task's memcg. */
>         if (!in_task() && !memcg)
>                 return;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-14 23:27     ` Kuniyuki Iwashima
@ 2025-08-14 23:46       ` Shakeel Butt
  2025-08-15  0:05         ` Kuniyuki Iwashima
  0 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2025-08-14 23:46 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
> On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> > > When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> > > sk->sk_memcg based on the current task.
> > >
> > > MPTCP subflow socket creation is triggered from userspace or
> > > an in-kernel worker.
> > >
> > > In the latter case, sk->sk_memcg is not what we want.  So, we fix
> > > it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> > >
> > > Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> > > under #ifdef CONFIG_SOCK_CGROUP_DATA.
> > >
> > > The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
> > > CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> > > correctly.
> > >
> > > Let's wrap sock_create_kern() for subflow with set_active_memcg()
> > > using the parent sk->sk_memcg.
> > >
> > > Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> > > Suggested-by: Michal Koutný <mkoutny@suse.com>
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > > ---
> > >  mm/memcontrol.c     |  5 ++++-
> > >  net/mptcp/subflow.c | 11 +++--------
> > >  2 files changed, 7 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 8dd7fbed5a94..450862e7fd7a 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> > >       if (!in_task())
> > >               return;
> > >
> > > +     memcg = current->active_memcg;
> > > +
> >
> > Use active_memcg() instead of current->active_memcg and do before the
> > !in_task() check.
> 
> Why not reuse the !in_task() check here ?
> We never use int_active_memcg for socket and also
> know int_active_memcg is always NULL here.
> 

If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
infra then make it work for both in_task() and !in_task() contexts.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-14 23:46       ` Shakeel Butt
@ 2025-08-15  0:05         ` Kuniyuki Iwashima
  2025-08-15  1:05           ` Shakeel Butt
  0 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-15  0:05 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 4:46 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
> > On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> > > > When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> > > > sk->sk_memcg based on the current task.
> > > >
> > > > MPTCP subflow socket creation is triggered from userspace or
> > > > an in-kernel worker.
> > > >
> > > > In the latter case, sk->sk_memcg is not what we want.  So, we fix
> > > > it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> > > >
> > > > Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> > > > under #ifdef CONFIG_SOCK_CGROUP_DATA.
> > > >
> > > > The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
> > > > CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> > > > correctly.
> > > >
> > > > Let's wrap sock_create_kern() for subflow with set_active_memcg()
> > > > using the parent sk->sk_memcg.
> > > >
> > > > Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> > > > Suggested-by: Michal Koutný <mkoutny@suse.com>
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > > > ---
> > > >  mm/memcontrol.c     |  5 ++++-
> > > >  net/mptcp/subflow.c | 11 +++--------
> > > >  2 files changed, 7 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 8dd7fbed5a94..450862e7fd7a 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> > > >       if (!in_task())
> > > >               return;
> > > >
> > > > +     memcg = current->active_memcg;
> > > > +
> > >
> > > Use active_memcg() instead of current->active_memcg and do before the
> > > !in_task() check.
> >
> > Why not reuse the !in_task() check here ?
> > We never use int_active_memcg for socket and also
> > know int_active_memcg is always NULL here.
> >
>
> If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
> infra then make it work for both in_task() and !in_task() contexts.

Considering e876ecc67db80, then I think we should add
set_active_memcg_in_task() and active_memcg_in_task().

or at least we need WARN_ON() if we want to place active_memcg()
before the in_task() check, but this looks ugly.

        memcg = active_memcg();
        if (!in_task() && !memcg)
                return;
        DEBUG_NET_WARN_ON_ONCE(!in_task() && memcg))

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-15  0:05         ` Kuniyuki Iwashima
@ 2025-08-15  1:05           ` Shakeel Butt
  2025-08-15  2:31             ` Kuniyuki Iwashima
  0 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2025-08-15  1:05 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 05:05:56PM -0700, Kuniyuki Iwashima wrote:
> On Thu, Aug 14, 2025 at 4:46 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
> > > On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > >
> > > > On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> > > > > When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> > > > > sk->sk_memcg based on the current task.
> > > > >
> > > > > MPTCP subflow socket creation is triggered from userspace or
> > > > > an in-kernel worker.
> > > > >
> > > > > In the latter case, sk->sk_memcg is not what we want.  So, we fix
> > > > > it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> > > > >
> > > > > Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> > > > > under #ifdef CONFIG_SOCK_CGROUP_DATA.
> > > > >
> > > > > The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
> > > > > CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> > > > > correctly.
> > > > >
> > > > > Let's wrap sock_create_kern() for subflow with set_active_memcg()
> > > > > using the parent sk->sk_memcg.
> > > > >
> > > > > Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> > > > > Suggested-by: Michal Koutný <mkoutny@suse.com>
> > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > > > > ---
> > > > >  mm/memcontrol.c     |  5 ++++-
> > > > >  net/mptcp/subflow.c | 11 +++--------
> > > > >  2 files changed, 7 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > index 8dd7fbed5a94..450862e7fd7a 100644
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> > > > >       if (!in_task())
> > > > >               return;
> > > > >
> > > > > +     memcg = current->active_memcg;
> > > > > +
> > > >
> > > > Use active_memcg() instead of current->active_memcg and do before the
> > > > !in_task() check.
> > >
> > > Why not reuse the !in_task() check here ?
> > > We never use int_active_memcg for socket and also
> > > know int_active_memcg is always NULL here.
> > >
> >
> > If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
> > infra then make it work for both in_task() and !in_task() contexts.
> 
> Considering e876ecc67db80, then I think we should add
> set_active_memcg_in_task() and active_memcg_in_task().
> 
> or at least we need WARN_ON() if we want to place active_memcg()
> before the in_task() check, but this looks ugly.
> 
>         memcg = active_memcg();
>         if (!in_task() && !memcg)
>                 return;
>         DEBUG_NET_WARN_ON_ONCE(!in_task() && memcg))

You don't have to use the code as is. It is just an example. Basically I
am asking if in future someone does the following:

	// in !in_task() context
	old_memcg = set_active_memcg(new_memcg);
	sk = sk_alloc();
	set_active_memcg(old_memcg);

mem_cgroup_sk_alloc() should work and associate the sk with the
new_memcg.

You can manually inline active_memcg() function to avoid multiple
in_task() checks like below:

void mem_cgroup_sk_alloc(struct sock *sk)
{
	struct mem_cgroup *memcg;

	if (!mem_cgroup_sockets_enabled)
		return;
	
	if (!in_task()) {
		memcg = this_cpu_read(int_active_memcg);

		/*
		 * Do not associate the sock with unrelated interrupted
		 * task's memcg.
		 */
		if (!memcg)
			return;
	} else {
		memcg = current->active_memcg;
	}

	rcu_read_lock();
	if (likely(!memcg))
		memcg = mem_cgroup_from_task(current);
	if (mem_cgroup_is_root(memcg))
		goto out;
	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg1_tcpmem_active(memcg))
		goto out;
	if (css_tryget(&memcg->css))
		sk->sk_memcg = memcg;
out:
	rcu_read_unlock();
}


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-15  1:05           ` Shakeel Butt
@ 2025-08-15  2:31             ` Kuniyuki Iwashima
  2025-08-15 17:24               ` Kuniyuki Iwashima
  0 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-15  2:31 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 6:06 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Aug 14, 2025 at 05:05:56PM -0700, Kuniyuki Iwashima wrote:
> > On Thu, Aug 14, 2025 at 4:46 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
> > > > On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > > >
> > > > > On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> > > > > > When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> > > > > > sk->sk_memcg based on the current task.
> > > > > >
> > > > > > MPTCP subflow socket creation is triggered from userspace or
> > > > > > an in-kernel worker.
> > > > > >
> > > > > > In the latter case, sk->sk_memcg is not what we want.  So, we fix
> > > > > > it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> > > > > >
> > > > > > Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> > > > > > under #ifdef CONFIG_SOCK_CGROUP_DATA.
> > > > > >
> > > > > > The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
> > > > > > CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> > > > > > correctly.
> > > > > >
> > > > > > Let's wrap sock_create_kern() for subflow with set_active_memcg()
> > > > > > using the parent sk->sk_memcg.
> > > > > >
> > > > > > Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> > > > > > Suggested-by: Michal Koutný <mkoutny@suse.com>
> > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > > > > > ---
> > > > > >  mm/memcontrol.c     |  5 ++++-
> > > > > >  net/mptcp/subflow.c | 11 +++--------
> > > > > >  2 files changed, 7 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > > index 8dd7fbed5a94..450862e7fd7a 100644
> > > > > > --- a/mm/memcontrol.c
> > > > > > +++ b/mm/memcontrol.c
> > > > > > @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> > > > > >       if (!in_task())
> > > > > >               return;
> > > > > >
> > > > > > +     memcg = current->active_memcg;
> > > > > > +
> > > > >
> > > > > Use active_memcg() instead of current->active_memcg and do before the
> > > > > !in_task() check.
> > > >
> > > > Why not reuse the !in_task() check here ?
> > > > We never use int_active_memcg for socket and also
> > > > know int_active_memcg is always NULL here.
> > > >
> > >
> > > If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
> > > infra then make it work for both in_task() and !in_task() contexts.
> >
> > Considering e876ecc67db80, then I think we should add
> > set_active_memcg_in_task() and active_memcg_in_task().
> >
> > or at least we need WARN_ON() if we want to place active_memcg()
> > before the in_task() check, but this looks ugly.
> >
> >         memcg = active_memcg();
> >         if (!in_task() && !memcg)
> >                 return;
> >         DEBUG_NET_WARN_ON_ONCE(!in_task() && memcg))
>
> You don't have to use the code as is. It is just an example. Basically I
> am asking if in future someone does the following:
>
>         // in !in_task() context
>         old_memcg = set_active_memcg(new_memcg);
>         sk = sk_alloc();
>         set_active_memcg(old_memcg);
>
> mem_cgroup_sk_alloc() should work and associate the sk with the
> new_memcg.
>
> You can manually inline active_memcg() function to avoid multiple
> in_task() checks like below:

Will do so, thanks!


>
> void mem_cgroup_sk_alloc(struct sock *sk)
> {
>         struct mem_cgroup *memcg;
>
>         if (!mem_cgroup_sockets_enabled)
>                 return;
>
>         if (!in_task()) {
>                 memcg = this_cpu_read(int_active_memcg);
>
>                 /*
>                  * Do not associate the sock with unrelated interrupted
>                  * task's memcg.
>                  */
>                 if (!memcg)
>                         return;
>         } else {
>                 memcg = current->active_memcg;
>         }
>
>         rcu_read_lock();
>         if (likely(!memcg))
>                 memcg = mem_cgroup_from_task(current);
>         if (mem_cgroup_is_root(memcg))
>                 goto out;
>         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg1_tcpmem_active(memcg))
>                 goto out;
>         if (css_tryget(&memcg->css))
>                 sk->sk_memcg = memcg;
> out:
>         rcu_read_unlock();
> }
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-15  2:31             ` Kuniyuki Iwashima
@ 2025-08-15 17:24               ` Kuniyuki Iwashima
  2025-08-15 17:30                 ` Matthieu Baerts
  2025-08-15 17:39                 ` Michal Koutný
  0 siblings, 2 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-15 17:24 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Matthieu Baerts, Mat Martineau,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Andrew Morton,
	Michal Koutný, Tejun Heo, Simon Horman, Geliang Tang,
	Muchun Song, Mina Almasry, Kuniyuki Iwashima, netdev, mptcp,
	cgroups, linux-mm

On Thu, Aug 14, 2025 at 7:31 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Thu, Aug 14, 2025 at 6:06 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Thu, Aug 14, 2025 at 05:05:56PM -0700, Kuniyuki Iwashima wrote:
> > > On Thu, Aug 14, 2025 at 4:46 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > >
> > > > On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
> > > > > On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > > > >
> > > > > > On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
> > > > > > > When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
> > > > > > > sk->sk_memcg based on the current task.
> > > > > > >
> > > > > > > MPTCP subflow socket creation is triggered from userspace or
> > > > > > > an in-kernel worker.
> > > > > > >
> > > > > > > In the latter case, sk->sk_memcg is not what we want.  So, we fix
> > > > > > > it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
> > > > > > >
> > > > > > > Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
> > > > > > > under #ifdef CONFIG_SOCK_CGROUP_DATA.
> > > > > > >
> > > > > > > The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
> > > > > > > CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
> > > > > > > correctly.
> > > > > > >
> > > > > > > Let's wrap sock_create_kern() for subflow with set_active_memcg()
> > > > > > > using the parent sk->sk_memcg.
> > > > > > >
> > > > > > > Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
> > > > > > > Suggested-by: Michal Koutný <mkoutny@suse.com>
> > > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > > > > > > ---
> > > > > > >  mm/memcontrol.c     |  5 ++++-
> > > > > > >  net/mptcp/subflow.c | 11 +++--------
> > > > > > >  2 files changed, 7 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > > > index 8dd7fbed5a94..450862e7fd7a 100644
> > > > > > > --- a/mm/memcontrol.c
> > > > > > > +++ b/mm/memcontrol.c
> > > > > > > @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> > > > > > >       if (!in_task())
> > > > > > >               return;
> > > > > > >
> > > > > > > +     memcg = current->active_memcg;
> > > > > > > +
> > > > > >
> > > > > > Use active_memcg() instead of current->active_memcg and do before the
> > > > > > !in_task() check.
> > > > >
> > > > > Why not reuse the !in_task() check here ?
> > > > > We never use int_active_memcg for socket and also
> > > > > know int_active_memcg is always NULL here.
> > > > >
> > > >
> > > > If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
> > > > infra then make it work for both in_task() and !in_task() contexts.
> > >
> > > Considering e876ecc67db80, then I think we should add
> > > set_active_memcg_in_task() and active_memcg_in_task().
> > >
> > > or at least we need WARN_ON() if we want to place active_memcg()
> > > before the in_task() check, but this looks ugly.
> > >
> > >         memcg = active_memcg();
> > >         if (!in_task() && !memcg)
> > >                 return;
> > >         DEBUG_NET_WARN_ON_ONCE(!in_task() && memcg))
> >
> > You don't have to use the code as is. It is just an example. Basically I
> > am asking if in future someone does the following:
> >
> >         // in !in_task() context
> >         old_memcg = set_active_memcg(new_memcg);
> >         sk = sk_alloc();
> >         set_active_memcg(old_memcg);
> >
> > mem_cgroup_sk_alloc() should work and associate the sk with the
> > new_memcg.
> >
> > You can manually inline active_memcg() function to avoid multiple
> > in_task() checks like below:
>
> Will do so, thanks!

I noticed this won't work with the bpf approach as the
hook is only called for !sk_kern socket (MPTCP subflow
is sk_kern == 1) and we need to manually copy the
memcg anyway.. so I'll use the original patch 1 in the
next version.


> >
> > void mem_cgroup_sk_alloc(struct sock *sk)
> > {
> >         struct mem_cgroup *memcg;
> >
> >         if (!mem_cgroup_sockets_enabled)
> >                 return;
> >
> >         if (!in_task()) {
> >                 memcg = this_cpu_read(int_active_memcg);
> >
> >                 /*
> >                  * Do not associate the sock with unrelated interrupted
> >                  * task's memcg.
> >                  */
> >                 if (!memcg)
> >                         return;
> >         } else {
> >                 memcg = current->active_memcg;
> >         }
> >
> >         rcu_read_lock();
> >         if (likely(!memcg))
> >                 memcg = mem_cgroup_from_task(current);
> >         if (mem_cgroup_is_root(memcg))
> >                 goto out;
> >         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg1_tcpmem_active(memcg))
> >                 goto out;
> >         if (css_tryget(&memcg->css))
> >                 sk->sk_memcg = memcg;
> > out:
> >         rcu_read_unlock();
> > }
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-15 17:24               ` Kuniyuki Iwashima
@ 2025-08-15 17:30                 ` Matthieu Baerts
  2025-08-15 17:39                 ` Michal Koutný
  1 sibling, 0 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-08-15 17:30 UTC (permalink / raw)
  To: Kuniyuki Iwashima, Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Neal Cardwell,
	Paolo Abeni, Willem de Bruijn, Mat Martineau, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Andrew Morton, Michal Koutný,
	Tejun Heo, Simon Horman, Geliang Tang, Muchun Song, Mina Almasry,
	Kuniyuki Iwashima, netdev, mptcp, cgroups, linux-mm

Hi Kuniyuki,

On 15/08/2025 19:24, Kuniyuki Iwashima wrote:
> On Thu, Aug 14, 2025 at 7:31 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>
>> On Thu, Aug 14, 2025 at 6:06 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>
>>> On Thu, Aug 14, 2025 at 05:05:56PM -0700, Kuniyuki Iwashima wrote:
>>>> On Thu, Aug 14, 2025 at 4:46 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>>>
>>>>> On Thu, Aug 14, 2025 at 04:27:31PM -0700, Kuniyuki Iwashima wrote:
>>>>>> On Thu, Aug 14, 2025 at 2:44 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>>>>>
>>>>>>> On Thu, Aug 14, 2025 at 08:08:33PM +0000, Kuniyuki Iwashima wrote:
>>>>>>>> When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets
>>>>>>>> sk->sk_memcg based on the current task.
>>>>>>>>
>>>>>>>> MPTCP subflow socket creation is triggered from userspace or
>>>>>>>> an in-kernel worker.
>>>>>>>>
>>>>>>>> In the latter case, sk->sk_memcg is not what we want.  So, we fix
>>>>>>>> it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup().
>>>>>>>>
>>>>>>>> Although the code is placed under #ifdef CONFIG_MEMCG, it is buried
>>>>>>>> under #ifdef CONFIG_SOCK_CGROUP_DATA.
>>>>>>>>
>>>>>>>> The two configs are orthogonal.  If CONFIG_MEMCG is enabled without
>>>>>>>> CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> Let's wrap sock_create_kern() for subflow with set_active_memcg()
>>>>>>>> using the parent sk->sk_memcg.
>>>>>>>>
>>>>>>>> Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup")
>>>>>>>> Suggested-by: Michal Koutný <mkoutny@suse.com>
>>>>>>>> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
>>>>>>>> ---
>>>>>>>>  mm/memcontrol.c     |  5 ++++-
>>>>>>>>  net/mptcp/subflow.c | 11 +++--------
>>>>>>>>  2 files changed, 7 insertions(+), 9 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>>>>>> index 8dd7fbed5a94..450862e7fd7a 100644
>>>>>>>> --- a/mm/memcontrol.c
>>>>>>>> +++ b/mm/memcontrol.c
>>>>>>>> @@ -5006,8 +5006,11 @@ void mem_cgroup_sk_alloc(struct sock *sk)
>>>>>>>>       if (!in_task())
>>>>>>>>               return;
>>>>>>>>
>>>>>>>> +     memcg = current->active_memcg;
>>>>>>>> +
>>>>>>>
>>>>>>> Use active_memcg() instead of current->active_memcg and do before the
>>>>>>> !in_task() check.
>>>>>>
>>>>>> Why not reuse the !in_task() check here ?
>>>>>> We never use int_active_memcg for socket and also
>>>>>> know int_active_memcg is always NULL here.
>>>>>>
>>>>>
>>>>> If we are making mem_cgroup_sk_alloc() work with set_active_memcg()
>>>>> infra then make it work for both in_task() and !in_task() contexts.
>>>>
>>>> Considering e876ecc67db80, then I think we should add
>>>> set_active_memcg_in_task() and active_memcg_in_task().
>>>>
>>>> or at least we need WARN_ON() if we want to place active_memcg()
>>>> before the in_task() check, but this looks ugly.
>>>>
>>>>         memcg = active_memcg();
>>>>         if (!in_task() && !memcg)
>>>>                 return;
>>>>         DEBUG_NET_WARN_ON_ONCE(!in_task() && memcg))
>>>
>>> You don't have to use the code as is. It is just an example. Basically I
>>> am asking if in future someone does the following:
>>>
>>>         // in !in_task() context
>>>         old_memcg = set_active_memcg(new_memcg);
>>>         sk = sk_alloc();
>>>         set_active_memcg(old_memcg);
>>>
>>> mem_cgroup_sk_alloc() should work and associate the sk with the
>>> new_memcg.
>>>
>>> You can manually inline active_memcg() function to avoid multiple
>>> in_task() checks like below:
>>
>> Will do so, thanks!
> 
> I noticed this won't work with the bpf approach as the
> hook is only called for !sk_kern socket (MPTCP subflow
> is sk_kern == 1) and we need to manually copy the
> memcg anyway.. so I'll use the original patch 1 in the
> next version.

Thank you for having checked that!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  2025-08-15 17:24               ` Kuniyuki Iwashima
  2025-08-15 17:30                 ` Matthieu Baerts
@ 2025-08-15 17:39                 ` Michal Koutný
  1 sibling, 0 replies; 28+ messages in thread
From: Michal Koutný @ 2025-08-15 17:39 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Shakeel Butt, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Neal Cardwell, Paolo Abeni, Willem de Bruijn, Matthieu Baerts,
	Mat Martineau, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Andrew Morton, Tejun Heo, Simon Horman, Geliang Tang, Muchun Song,
	Mina Almasry, Kuniyuki Iwashima, netdev, mptcp, cgroups, linux-mm

[-- Attachment #1: Type: text/plain, Size: 196 bytes --]

On Fri, Aug 15, 2025 at 10:24:12AM -0700, Kuniyuki Iwashima <kuniyu@google.com> wrote:
> and we need to manually copy the memcg anyway..

(Is that in a later patch?)

But fair enough :-/


Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-08-15 17:39 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14 20:08 [PATCH v4 net-next 00/10] net-memcg: Gather memcg code under CONFIG_MEMCG Kuniyuki Iwashima
2025-08-14 20:08 ` [PATCH v4 net-next 01/10] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n Kuniyuki Iwashima
2025-08-14 21:44   ` Shakeel Butt
2025-08-14 23:27     ` Kuniyuki Iwashima
2025-08-14 23:46       ` Shakeel Butt
2025-08-15  0:05         ` Kuniyuki Iwashima
2025-08-15  1:05           ` Shakeel Butt
2025-08-15  2:31             ` Kuniyuki Iwashima
2025-08-15 17:24               ` Kuniyuki Iwashima
2025-08-15 17:30                 ` Matthieu Baerts
2025-08-15 17:39                 ` Michal Koutný
2025-08-14 20:08 ` [PATCH v4 net-next 02/10] mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready() Kuniyuki Iwashima
2025-08-14 20:08 ` [PATCH v4 net-next 03/10] tcp: Simplify error path in inet_csk_accept() Kuniyuki Iwashima
2025-08-14 20:08 ` [PATCH v4 net-next 04/10] net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV Kuniyuki Iwashima
2025-08-14 20:08 ` [PATCH v4 net-next 05/10] net: Clean up __sk_mem_raise_allocated() Kuniyuki Iwashima
2025-08-14 20:08 ` [PATCH v4 net-next 06/10] net-memcg: Introduce mem_cgroup_from_sk() Kuniyuki Iwashima
2025-08-14 21:51   ` Shakeel Butt
2025-08-14 20:08 ` [PATCH v4 net-next 07/10] net-memcg: Introduce mem_cgroup_sk_enabled() Kuniyuki Iwashima
2025-08-14 21:51   ` Shakeel Butt
2025-08-14 20:08 ` [PATCH v4 net-next 08/10] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge() Kuniyuki Iwashima
2025-08-14 21:54   ` Shakeel Butt
2025-08-14 20:08 ` [PATCH v4 net-next 09/10] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure() Kuniyuki Iwashima
2025-08-14 22:00   ` Shakeel Butt
2025-08-14 22:10     ` Shakeel Butt
2025-08-14 23:22       ` Kuniyuki Iwashima
2025-08-14 20:08 ` [PATCH v4 net-next 10/10] net: Define sk_memcg under CONFIG_MEMCG Kuniyuki Iwashima
2025-08-14 20:21   ` Roman Gushchin
2025-08-14 22:10   ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).