netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] memcg: expose socket memory pressure in a cgroup
@ 2025-07-22  7:11 Daniel Sedlak
  2025-07-22  7:17 ` Eric Dumazet
  2025-07-22  8:57 ` Michal Koutný
  0 siblings, 2 replies; 20+ messages in thread
From: Daniel Sedlak @ 2025-07-22  7:11 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Neal Cardwell, Kuniyuki Iwashima,
	David Ahern, Andrew Morton, Shakeel Butt, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups
  Cc: Daniel Sedlak, Matyas Hurtik

This patch is a result of our long-standing debug sessions, where it all
started as "networking is slow", and TCP network throughput suddenly
dropped from tens of Gbps to few Mbps, and we could not see anything in
the kernel log or netstat counters.

Currently, we have two memory pressure counters for TCP sockets [1],
which we manipulate only when the memory pressure is signalled through
the proto struct [2]. However, the memory pressure can also be signaled
through the cgroup memory subsystem, which we do not reflect in the
netstat counters. In the end, when the cgroup memory subsystem signals
that it is under pressure, we silently reduce the advertised TCP window
with tcp_adjust_rcv_ssthresh() to 4*advmss, which causes a significant
throughput reduction.

Keep in mind that when the cgroup memory subsystem signals the socket
memory pressure, it affects all sockets used in that cgroup.

This patch exposes a new file for each cgroup in sysfs which signals
the cgroup socket memory pressure. The file is accessible in
the following path.

  /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure

The output value is an integer matching the internal semantics of the
struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
representing the end of the said socket memory pressure, and once the
clock is re-armed it is set to jiffies + HZ.

Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/uapi/linux/snmp.h#L231-L232 [1]
Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/net/sock.h#L1300-L1301 [2]
Co-developed-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
Signed-off-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
Signed-off-by: Daniel Sedlak <daniel.sedlak@cdn77.com>
---
Changes:
v2 -> v3:
- Expose the socket memory pressure on the cgroups instead of netstat
- Split patch
- Link: https://lore.kernel.org/netdev/20250714143613.42184-1-daniel.sedlak@cdn77.com/

v1 -> v2:
- Add tracepoint
- Link: https://lore.kernel.org/netdev/20250707105205.222558-1-daniel.sedlak@cdn77.com/


 mm/memcontrol.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 902da8a9c643..8e8808fb2d7a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4647,6 +4647,15 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 	return nbytes;
 }
 
+static int memory_socket_pressure_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+
+	seq_printf(m, "%lu\n", READ_ONCE(memcg->socket_pressure));
+
+	return 0;
+}
+
 static struct cftype memory_files[] = {
 	{
 		.name = "current",
@@ -4718,6 +4727,11 @@ static struct cftype memory_files[] = {
 		.flags = CFTYPE_NS_DELEGATABLE,
 		.write = memory_reclaim,
 	},
+	{
+		.name = "net.socket_pressure",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_socket_pressure_show,
+	},
 	{ }	/* terminate */
 };
 

base-commit: e96ee511c906c59b7c4e6efd9d9b33917730e000
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-30  0:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22  7:11 [PATCH v3] memcg: expose socket memory pressure in a cgroup Daniel Sedlak
2025-07-22  7:17 ` Eric Dumazet
2025-07-22  7:27   ` Daniel Sedlak
2025-07-22  8:57 ` Michal Koutný
2025-07-22 17:50   ` Shakeel Butt
2025-07-22 18:27     ` Kuniyuki Iwashima
2025-07-22 18:41       ` Waiman Long
2025-07-22 18:49         ` Kuniyuki Iwashima
2025-07-22 19:05       ` Shakeel Butt
2025-07-22 19:58         ` Kuniyuki Iwashima
2025-07-22 20:11           ` Shakeel Butt
2025-07-22 22:10             ` Kuniyuki Iwashima
2025-07-23  8:38             ` Michal Koutný
2025-07-23  8:58               ` Daniel Sedlak
2025-07-23 17:54                 ` Shakeel Butt
2025-07-24  8:43                   ` Daniel Sedlak
2025-07-25  0:44                     ` Tejun Heo
2025-07-28 11:29                       ` Daniel Sedlak
2025-07-30  0:15                         ` Tejun Heo
2025-07-23  8:41         ` Daniel Sedlak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).