From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3622EC83F22 for ; Wed, 16 Jul 2025 18:08:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADBF08D0003; Wed, 16 Jul 2025 14:08:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8BF88D0001; Wed, 16 Jul 2025 14:08:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97B8D8D0003; Wed, 16 Jul 2025 14:08:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 823058D0001 for ; Wed, 16 Jul 2025 14:08:00 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 52E1B1DA57B for ; Wed, 16 Jul 2025 18:08:00 +0000 (UTC) X-FDA: 83670911520.24.CF2A7EA Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf08.hostedemail.com (Postfix) with ESMTP id 616DC160009 for ; Wed, 16 Jul 2025 18:07:58 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=yBFhSXYX; spf=pass (imf08.hostedemail.com: domain of kuniyu@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=kuniyu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752689278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HtBqIB8vQTR5u8cmm+T4QFZrBMFS3cf5PT3R4konQOk=; b=rX+lisUjjGEIfu3A/kOUHRReSJMzLAuKuKGxw6gqCVBBwzjY8SezmwVqUFqy/5uIV3miJj 1Rz+MIGEYYUxFBdOaEKlxFRHUmXmy9TNLqsyK4ucfhwUF8mWnufdkLDFL8YfpASuoMU8rX RNDF7uEhalOwyOROygJNfGzs2sq4irU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752689278; a=rsa-sha256; cv=none; b=4xnzAOTVcGWQDwaomRW5xQaRRZ2ReNjlZjRDMHJbVhRad4OEISkTTA6Pn+rUbuMPwDdhgF c6R0vTPVCjISmRD5PGfz/fPcZ5JmWApUZE0g0VsR6n18cXGxjTPHKSMyxD0KMpUJ9KPdfU xj/Ai6AWzkoF+vLxDpezXPMvHkLB8VM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=yBFhSXYX; spf=pass (imf08.hostedemail.com: domain of kuniyu@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=kuniyu@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-23649faf69fso1305675ad.0 for ; Wed, 16 Jul 2025 11:07:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752689277; x=1753294077; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HtBqIB8vQTR5u8cmm+T4QFZrBMFS3cf5PT3R4konQOk=; b=yBFhSXYXcLgAJJIBjrbb/RM9xww33M/oQo0sq/n46I+ck8/THYlPjXhjUiVhckuXHH U5xHWA1iDI1msCtD1M16RDwe6lnUV6wjp/M7aWKrgMtapGLu/asoMXiLPOEvsGiamixh IuRFOH+O5KUukxuesHZjj2VOAvQg4Y+8/VOT2XQU70ye8nUVk6jnrmbK2gT+vGXQ5TH7 WF/weVyFhoFlaLdklnPqy+h903f6NUt/S661MybQ/0J90s+13K9U/xNq3n3WSo/7W627 0qmXR1ImCTZ0TkIpmgLQ9inBEDGEKDOgcfmJZUrGQfCBuoXYqBnOAVWW+a+GcqT7YV+c dGHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752689277; x=1753294077; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HtBqIB8vQTR5u8cmm+T4QFZrBMFS3cf5PT3R4konQOk=; b=vNpMnaScNT+uG/scVceSi9/KfRLJje0U1B8vw5wTN5tMrVab0AsDSpkoq8AZXcr6WQ G5+fNEIxtlj4A48Sc7SDc+dpwZoblKex0VdTxKKmKcdoeb0FIagEty1DMiKp50nWt+FK D7XToInhy4ImzletqMZAkaOFDOhzyakovBTw7RmSiIms3PsEIKx9TifCpeEV93KCVne0 vk/8u2kym1jKEqOOTnHNE+ON5Js5xj3aEXgeFZP+zEjig86vv7tQjQ3HQ23TpfJvnT21 em/wYKPne3/cOTsMzN6fLzAoG/hDUjYlKBxZBLoIuIeOf6l9JfTsXI4W5Ty4ScFP66k6 wnRw== X-Forwarded-Encrypted: i=1; AJvYcCUWrRJL07ugh7NNbq2ywfChnjPeTwHg70uBo41PvxCoS2/doydywVY4CNQUZRPZq/zmmRftwcQSbg==@kvack.org X-Gm-Message-State: AOJu0YxXMdC/T9ee6YFMotGk/xKNvUhC7jUgViP777l+VsXyykDoVAM5 e8WMfhmNco+aP7JJacHiDLz5zDLBQCp5/GKxv41fnz91CglS8Wk355M8rG3L2y0EG7AwiekG9xL IZXDCidym3EDsOp+UrjJGy73to57mcp4hZOJeQ/nf X-Gm-Gg: ASbGncuea1CCSBtxGu4Rz6Ak+CwV/KtRvzDQJ5KIPQRkVhJDxXaxRtWRoywzZUhZuwp Ae39SV3ue2TaJnIkVTYN5CfOSlesFJmMDi2CHfrtL9DqXOfjala/1wCERfB5KgEJNWLKT2oEEOQ M2nSbiYaducWjwLKrsV6SNeNpN0er4NM43HY0T97zUTAYOTOwpe5gikvpwD8Eq4R25IhpvW2qyG S+okxsmprdv1ha7FxH47Dg3t7VA3+E4eTxC3aq9DWbf3n6h X-Google-Smtp-Source: AGHT+IFGQs/MhfAF/+F7gLLhpBzOXJuauznlby7F1yG4NA9+LP2teg9k4yIgRkRG4Jy0JEX6O8hUmbjUe7rQ0t4POyo= X-Received: by 2002:a17:902:d490:b0:234:d7b2:2ab9 with SMTP id d9443c01a7336-23e256adcc4mr58822625ad.12.1752689276880; Wed, 16 Jul 2025 11:07:56 -0700 (PDT) MIME-Version: 1.0 References: <20250714143613.42184-1-daniel.sedlak@cdn77.com> <20250714143613.42184-2-daniel.sedlak@cdn77.com> In-Reply-To: From: Kuniyuki Iwashima Date: Wed, 16 Jul 2025 11:07:44 -0700 X-Gm-Features: Ac12FXzTb1KzWeKzpPlneXdiabppVB68MvIey5AKLhH3yNR2XP56uLG6-DIvxu4 Message-ID: Subject: Re: [PATCH v2 net-next 1/2] tcp: account for memory pressure signaled by cgroup To: Shakeel Butt Cc: Daniel Sedlak , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Jonathan Corbet , Neal Cardwell , David Ahern , Andrew Morton , Yosry Ahmed , linux-mm@kvack.org, netdev@vger.kernel.org, Matyas Hurtik Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 616DC160009 X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: qzkf8bk7ukeyxanm9c8boqqh4hk3aah1 X-HE-Tag: 1752689278-740490 X-HE-Meta: U2FsdGVkX1/s1916pmOSlu340qntT2E1nPD/7WLeFNd5jYdeokdnqheoFXOfF/A9mzpg+Oliy9txbGQd7iSmRhTOkTWaW9Qf1dGjFaqUjo/QrKR4jkaf7kLSEkHTkZfaBg6ST4awKz02yyeq2lWjfDPLGkKjNxB/cmIitmVLlg17EVPeHPN69r9SnSpSGbMq6fjqqlfvaTVdGs3tpmLF2CiTf+DaSxx0dOjLqhat/yaKvrZcOX1sgAygRXsti7SkIoYgn/yNNFSekxiuUBfG0WvoMPGLm/Jrc7i1QUq5rdJWffo+DgUUAnIDmsIrfIi6gv6uWeRuvasTzkmyOclB0/bwG9gWrfWjqzao/IopuBcuCLSAiby1hnmQbcQpTsoCRKoZIOgMoeDhuNST7PMnTvndg5hm2JQsKnhnbCrBK1ZzI4Y1wi6Hn6Mi5VuJcBy/s175n3emwtAcoNd6Y44NcQx00Ba1SACl1H6MTSax54PLtcED6XPTWMIVUTU//2tWioahVEMOF+Tyk4EeSV2xlLu39HF9j0215mJ96yq0a58ccb8URwQ4hgYMwnRBaQowHWN6K+6qcJnHf7zWkvfu3bIA29fJVZIxhy90EQpHW+rK+u1IrgUsyMaTJsczfbxwWKYqhObXPGZM0zB3c6eEjhq8UqpK5zr9rmFPlxnjCbZCE0RkJnXc7ilM2qtFAcVHuoqDDb9srzYQDJJ32Ljri2yYaWN3BVZf+eAI6Tm8Ux3CPQHG84RuFuA6z4X/MP9cuq0hRFklJ0gd4yqK1OVxK74Gme35QGcM4DXWTzMdyToQpQRTD+YIm+G8aTkaLLlGu4PCwln+jEk1lJ1EslYpXACCdhYDGgBbzdy9hOfKH4pJmB7zLnutb8mm53y6Nl9rm2GyDEoNtT3SRwWJ+U1uMt550NkeCelRxLKWWP8atVNnR/uabTTiV8TiCEz3dc3yd+EzITRsEEDKdqlgm7V HAUjyVop XEpM8SmEs8Y7TqJ5UACOlv8SBvt75G8o9WwHFoJpx+j9tvcK+FE4hlVKEnDU4rchAeaz8klSUkuhNBTS2GGb+XHkJwrDhpPNILcuV2KnvazbpZGiWePKcgJ9bDv28t+2x6YsHj9WZdBU+zp0PjMHWIqF0AzqHx9ofrnKz52W/8ZpmtBeX7c+XvtWsKSUsJuO9ZWsIR4DVWaFi+ZYGrsPKpFssN6leUiclLxQpO/jedcOFNM7+c9LTzXyvVrwGgbJduhXLJqr/H4KaCluD7p5wijZFISHt59KYCyCXIN3zvvmNHIe7l1gBh2EVgGJsANa4k3kX8Xb9lEnKh4O6P6QVs+hZRLsZRvKIF8pd9NSyhyTVm6bnNwOI1kbKL4+hrONOJEarGKHHp3qbPMMwMdKkuImxyA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 16, 2025 at 9:50=E2=80=AFAM Shakeel Butt wrote: > > On Mon, Jul 14, 2025 at 04:36:12PM +0200, Daniel Sedlak wrote: > > This patch is a result of our long-standing debug sessions, where it al= l > > started as "networking is slow", and TCP network throughput suddenly > > dropped from tens of Gbps to few Mbps, and we could not see anything in > > the kernel log or netstat counters. > > > > Currently, we have two memory pressure counters for TCP sockets [1], > > which we manipulate only when the memory pressure is signalled through > > the proto struct [2]. However, the memory pressure can also be signaled > > through the cgroup memory subsystem, which we do not reflect in the > > netstat counters. In the end, when the cgroup memory subsystem signals > > that it is under pressure, we silently reduce the advertised TCP window > > with tcp_adjust_rcv_ssthresh() to 4*advmss, which causes a significant > > throughput reduction. > > > > So this patch adds a new counter to account for memory pressure > > signaled by the memory cgroup, so it is much easier to spot. > > > > Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/uapi/linu= x/snmp.h#L231-L232 [1] > > Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/net/sock.= h#L1300-L1301 [2] > > Co-developed-by: Matyas Hurtik > > Signed-off-by: Matyas Hurtik > > Signed-off-by: Daniel Sedlak > > --- > > Documentation/networking/net_cachelines/snmp.rst | 1 + > > include/net/tcp.h | 14 ++++++++------ > > include/uapi/linux/snmp.h | 1 + > > net/ipv4/proc.c | 1 + > > 4 files changed, 11 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/networking/net_cachelines/snmp.rst b/Documen= tation/networking/net_cachelines/snmp.rst > > index bd44b3eebbef..ed17ff84e39c 100644 > > --- a/Documentation/networking/net_cachelines/snmp.rst > > +++ b/Documentation/networking/net_cachelines/snmp.rst > > @@ -76,6 +76,7 @@ unsigned_long LINUX_MIB_TCPABORTONLINGER > > unsigned_long LINUX_MIB_TCPABORTFAILED > > unsigned_long LINUX_MIB_TCPMEMORYPRESSURES > > unsigned_long LINUX_MIB_TCPMEMORYPRESSURESCHRONO > > +unsigned_long LINUX_MIB_TCPCGROUPSOCKETPRESSURE > > unsigned_long LINUX_MIB_TCPSACKDISCARD > > unsigned_long LINUX_MIB_TCPDSACKIGNOREDOLD > > unsigned_long LINUX_MIB_TCPDSACKIGNOREDNOUNDO > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > index 761c4a0ad386..aae3efe24282 100644 > > --- a/include/net/tcp.h > > +++ b/include/net/tcp.h > > @@ -267,6 +267,11 @@ extern long sysctl_tcp_mem[3]; > > #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ > > #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in= RACK */ > > > > +#define TCP_INC_STATS(net, field) SNMP_INC_STATS((net)->mib.tcp_sta= tistics, field) > > +#define __TCP_INC_STATS(net, field) __SNMP_INC_STATS((net)->mib.tcp_s= tatistics, field) > > +#define TCP_DEC_STATS(net, field) SNMP_DEC_STATS((net)->mib.tcp_sta= tistics, field) > > +#define TCP_ADD_STATS(net, field, val) SNMP_ADD_STATS((net)->mib= .tcp_statistics, field, val) > > + > > extern atomic_long_t tcp_memory_allocated; > > DECLARE_PER_CPU(int, tcp_memory_per_cpu_fw_alloc); > > > > @@ -277,8 +282,10 @@ extern unsigned long tcp_memory_pressure; > > static inline bool tcp_under_memory_pressure(const struct sock *sk) > > { > > if (mem_cgroup_sockets_enabled && sk->sk_memcg && > > - mem_cgroup_under_socket_pressure(sk->sk_memcg)) > > + mem_cgroup_under_socket_pressure(sk->sk_memcg)) { > > + TCP_INC_STATS(sock_net(sk), LINUX_MIB_TCPCGROUPSOCKETPRES= SURE); > > return true; > > Incrementing it here will give a very different semantic to this stat > compared to LINUX_MIB_TCPMEMORYPRESSURES. Here the increments mean the > number of times the kernel check if a given socket is under memcg > pressure for a net namespace. Is that what we want? I'm trying to decouple sk_memcg from the global tcp_memory_allocated as you and Wei planned before, and the two accounting already have the different semantics from day1 and will keep that, so a new stat having a different semantics would be fine. But I think per-memcg stat like memory.stat.XXX would be a good fit rather than pre-netns because one netns could be shared by multiple cgroups and multiple sockets in the same cgroup could be spread across multiple netns.