cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@google.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	Neal Cardwell <ncardwell@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	 Willem de Bruijn <willemb@google.com>,
	Matthieu Baerts <matttbe@kernel.org>,
	 Mat Martineau <martineau@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Simon Horman <horms@kernel.org>,
	Geliang Tang <geliang@kernel.org>,
	 Muchun Song <muchun.song@linux.dev>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	 Kuniyuki Iwashima <kuni1840@gmail.com>,
	netdev@vger.kernel.org, mptcp@lists.linux.dev,
	 cgroups@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH v1 net-next 11/13] net-memcg: Add memory.socket_isolated knob.
Date: Mon, 21 Jul 2025 20:35:30 +0000	[thread overview]
Message-ID: <20250721203624.3807041-12-kuniyu@google.com> (raw)
In-Reply-To: <20250721203624.3807041-1-kuniyu@google.com>

Some networking protocols have their own global memory accounting,
and such memory is also charged to memcg as sock in memory.stat.

Such sockets are subject to the global limit, thus affected by a
noisy neighbour outside the cgroup.

We will decouple the global memory accounting if configured.

Let's add a per-memcg knob to control that.

The value will be saved in each socket when created and will
persist through the socket's lifetime.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 16 +++++++++++
 include/linux/memcontrol.h              |  6 ++++
 include/net/sock.h                      |  3 ++
 mm/memcontrol.c                         | 37 +++++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index bd98ea3175ec1..2428707b7d27d 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1878,6 +1878,22 @@ The following nested keys are defined.
 	Shows pressure stall information for memory. See
 	:ref:`Documentation/accounting/psi.rst <psi>` for details.
 
+  memory.socket_isolated
+	A read-write single value file which exists on non-root cgroups.
+	The default value is "0".
+
+	Some networking protocols (e.g., TCP, UDP) implement their own memory
+	accounting for socket buffers.
+
+	This memory is also charged to a non-root cgroup as sock in memory.stat.
+
+	Since per-protocol limits such as /proc/sys/net/ipv4/tcp_mem and
+	/proc/sys/net/ipv4/udp_mem are global, memory allocation for socket
+	buffers may fail even when the cgroup has available memory.
+
+	Sockets created with socket_isolated set to 1 are no longer subject
+	to these global protocol limits.
+
 
 Usage Guidelines
 ~~~~~~~~~~~~~~~~
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 211712ec57d1a..7d5d43e3b49e6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -226,6 +226,12 @@ struct mem_cgroup {
 	 */
 	bool oom_group;
 
+	/*
+	 * If set, MEMCG_SOCK memory is charged on memcg only,
+	 * otherwise, memcg and sk->sk_prot->memory_allocated.
+	 */
+	bool socket_isolated;
+
 	int swappiness;
 
 	/* memory.events and memory.events.local */
diff --git a/include/net/sock.h b/include/net/sock.h
index 16fe0e5afc587..5e8c73731531c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2597,6 +2597,9 @@ static inline gfp_t gfp_memcg_charge(void)
 }
 
 #ifdef CONFIG_MEMCG
+
+#define MEMCG_SOCK_ISOLATED	1UL
+
 static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk)
 {
 	return sk->sk_memcg;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d7f4e31f4e625..0a55c12a6679b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4645,6 +4645,37 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 	return nbytes;
 }
 
+static int memory_socket_isolated_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+
+	seq_printf(m, "%d\n", READ_ONCE(memcg->socket_isolated));
+
+	return 0;
+}
+
+static ssize_t memory_socket_isolated_write(struct kernfs_open_file *of,
+					    char *buf, size_t nbytes, loff_t off)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+	int ret, socket_isolated;
+
+	buf = strstrip(buf);
+	if (!buf)
+		return -EINVAL;
+
+	ret = kstrtoint(buf, 0, &socket_isolated);
+	if (ret)
+		return ret;
+
+	if (socket_isolated != 0 && socket_isolated != MEMCG_SOCK_ISOLATED)
+		return -EINVAL;
+
+	WRITE_ONCE(memcg->socket_isolated, socket_isolated);
+
+	return nbytes;
+}
+
 static struct cftype memory_files[] = {
 	{
 		.name = "current",
@@ -4716,6 +4747,12 @@ static struct cftype memory_files[] = {
 		.flags = CFTYPE_NS_DELEGATABLE,
 		.write = memory_reclaim,
 	},
+	{
+		.name = "socket_isolated",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_socket_isolated_show,
+		.write = memory_socket_isolated_write,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.50.0.727.gbf7dc18ff4-goog


  parent reply	other threads:[~2025-07-21 20:36 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-21 20:35 [PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated Kuniyuki Iwashima
2025-07-21 20:35 ` [PATCH v1 net-next 01/13] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n Kuniyuki Iwashima
2025-07-22 14:30   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 02/13] mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready() Kuniyuki Iwashima
2025-07-22 14:33   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 03/13] tcp: Simplify error path in inet_csk_accept() Kuniyuki Iwashima
2025-07-22 14:34   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 04/13] net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV Kuniyuki Iwashima
2025-07-22 14:37   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 05/13] net: Clean up __sk_mem_raise_allocated() Kuniyuki Iwashima
2025-07-22 14:38   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 06/13] net-memcg: Introduce mem_cgroup_from_sk() Kuniyuki Iwashima
2025-07-22 14:39   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 07/13] net-memcg: Introduce mem_cgroup_sk_enabled() Kuniyuki Iwashima
2025-07-22 14:40   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 08/13] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge() Kuniyuki Iwashima
2025-07-22 14:56   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 09/13] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure() Kuniyuki Iwashima
2025-07-22 14:58   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 10/13] net: Define sk_memcg under CONFIG_MEMCG Kuniyuki Iwashima
2025-07-22 14:58   ` Eric Dumazet
2025-07-21 20:35 ` Kuniyuki Iwashima [this message]
2025-07-22 15:00   ` [PATCH v1 net-next 11/13] net-memcg: Add memory.socket_isolated knob Eric Dumazet
2025-07-31 13:39   ` Michal Koutný
2025-07-21 20:35 ` [PATCH v1 net-next 12/13] net-memcg: Store memcg->socket_isolated in sk->sk_memcg Kuniyuki Iwashima
2025-07-22 15:02   ` Eric Dumazet
2025-07-21 20:35 ` [PATCH v1 net-next 13/13] net-memcg: Allow decoupling memcg from global protocol memory accounting Kuniyuki Iwashima
2025-07-22 15:14   ` Shakeel Butt
2025-07-22 15:24     ` Eric Dumazet
2025-07-22 15:52       ` Shakeel Butt
2025-07-22 18:18         ` Kuniyuki Iwashima
2025-07-22 18:47           ` Shakeel Butt
2025-07-22 19:03             ` Kuniyuki Iwashima
2025-07-22 19:56               ` Shakeel Butt
2025-07-22 21:59                 ` Kuniyuki Iwashima
2025-07-23  0:29                   ` Shakeel Butt
2025-07-23  2:35                     ` Kuniyuki Iwashima
2025-07-23 17:28                       ` Shakeel Butt
2025-07-23 18:06                         ` Kuniyuki Iwashima
2025-07-25  1:49                           ` Jakub Kicinski
2025-07-25 18:50                             ` Kuniyuki Iwashima
2025-07-28 16:07   ` Johannes Weiner
2025-07-28 21:41     ` Kuniyuki Iwashima
2025-07-29 14:22       ` Johannes Weiner
2025-07-29 19:41         ` Kuniyuki Iwashima
2025-07-31  2:58   ` Roman Gushchin
2025-07-31 13:38   ` Michal Koutný
2025-07-31 23:51     ` Kuniyuki Iwashima
2025-08-01  7:00       ` Michal Koutný
2025-08-01 16:27         ` Kuniyuki Iwashima
2025-07-22 15:04 ` [PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated Shakeel Butt
2025-07-22 15:34   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250721203624.3807041-12-kuniyu@google.com \
    --to=kuniyu@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=geliang@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=martineau@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=muchun.song@linux.dev \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).