From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 345E1C4332F for ; Tue, 22 Nov 2022 18:11:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B44F6B0071; Tue, 22 Nov 2022 13:11:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 962E56B0073; Tue, 22 Nov 2022 13:11:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 852088E0001; Tue, 22 Nov 2022 13:11:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 757156B0071 for ; Tue, 22 Nov 2022 13:11:53 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 46949AB950 for ; Tue, 22 Nov 2022 18:11:53 +0000 (UTC) X-FDA: 80161871706.25.C285E93 Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) by imf05.hostedemail.com (Postfix) with ESMTP id D13AE10000C for ; Tue, 22 Nov 2022 18:11:52 +0000 (UTC) Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-3abc71aafcaso19650297b3.3 for ; Tue, 22 Nov 2022 10:11:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=uTQ93Z87A52MpnvZoEIsory772cnGzPzKryozV7ZFMk=; b=eXRoXvJRiBCc4ufq2quAPTpY36052Qmg9ynJmmqrrDvdULqV2GyhSyyCXRtHLfReA0 UZ4uYA3R18WWgJQym+RjkpAA9QLjlQe9RkctGzZLf2CLo2m5Yjei/K0Gw+4PLUG7YH+C SieK0NVoP0S46kpu/NZHdkdeARBmklooxECdQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uTQ93Z87A52MpnvZoEIsory772cnGzPzKryozV7ZFMk=; b=E6Zv36yYRk766Yp3vr8OmSqPpdBiG+Fp2B2NLrti0jnvomFX+FIP9XvQbX6DsEOoMC EAi0phPF0JVJIiwF73DjoFwkUbR5dVx5xaX+bEd21/QQO8hFiwNoeEnQNtp7INIeOLnK TcOp0s0VLWn+JVIR71nr8J7ZwEJi/j7VVYeby+Asrz6sXWRYBivtgtvQwZ2OE6a52EPg DkGRBTJkwBrXTW9t+sdV2nGwF+3cki3RlPxzaEsXHwRqISFiatmocYagpM4ZO8TvrHS0 2uDqjXlrb9fHB0cvwsx8rW1vWM+drnv70UJXCK+kOnVALDI4GJ9K/8CEBVyu6sIjUFjb x7+A== X-Gm-Message-State: ANoB5plSp5jFm9A1mNOewNmEOqLHo+L2yQ3/IzrYcLnqPBX6PvnNGeJy PlwTlNWsexqsi+Q3o0VzO0vfgZ58WuwlKGtox8J7AA== X-Google-Smtp-Source: AA0mqf7emZOlCYdxjFv9OlOdAqA06u5zBljIwZe85CqwbyKVNWtXVkm08rhOWF/8TeguFv0utvqucfYZxFMqw4g11WY= X-Received: by 2002:a81:9957:0:b0:394:c5de:d29c with SMTP id q84-20020a819957000000b00394c5ded29cmr19799527ywg.224.1669140711955; Tue, 22 Nov 2022 10:11:51 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ivan Babrou Date: Tue, 22 Nov 2022 10:11:41 -0800 Message-ID: Subject: Re: Low TCP throughput due to vmpressure with swap enabled To: Eric Dumazet Cc: Linux MM , Linux Kernel Network Developers , linux-kernel , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , Paolo Abeni , cgroups@vger.kernel.org, kernel-team Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669140712; a=rsa-sha256; cv=none; b=JD/spFKcWmODR6H2Kw8Uu5pikudiUMhp12SQmBkeUA3feKL5M6xqILCvhV9VEQcehYAe8c li6rnrWjkGPH8twlIWgr9GdwzEFBkqVIR0NsJGwurTmq1S8rjs+jAoHf3OwMagsGgmEVf+ WhbpMHuT++V7WW2YTZH0qvTbhfj8gr0= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google header.b=eXRoXvJR; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=pass (imf05.hostedemail.com: domain of ivan@cloudflare.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=ivan@cloudflare.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669140712; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uTQ93Z87A52MpnvZoEIsory772cnGzPzKryozV7ZFMk=; b=dH/v5RAVuBxZrcHk65rhInUmbR3fAUvg744bG5EHtRQz91+DPI6EDpHAgWgObTE57q4H2J nYCO8tKtpcmW7wxoJUdElgBhtHiC1K8aY7m33ZutYEZ2lfuf3B0MfTOJDXeVPIpbK2xGcF rZNTeRfwH9a2Flg8MxwLw8jdO7oc14o= X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D13AE10000C X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google header.b=eXRoXvJR; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=pass (imf05.hostedemail.com: domain of ivan@cloudflare.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=ivan@cloudflare.com X-Stat-Signature: gn8diha35q8noq7f8juwdn4xsunpxbhu X-HE-Tag: 1669140712-850917 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 22, 2022 at 10:01 AM Eric Dumazet wrote: > > On Mon, Nov 21, 2022 at 4:53 PM Ivan Babrou wrote: > > > > Hello, > > > > We have observed a negative TCP throughput behavior from the following commit: > > > > * 8e8ae645249b mm: memcontrol: hook up vmpressure to socket pressure > > > > It landed back in 2016 in v4.5, so it's not exactly a new issue. > > > > The crux of the issue is that in some cases with swap present the > > workload can be unfairly throttled in terms of TCP throughput. > > I guess defining 'fairness' in such a scenario is nearly impossible. > > Have you tried changing /proc/sys/net/ipv4/tcp_rmem (and/or tcp_wmem) ? > Defaults are quite conservative. Yes, our max sizes are much higher than the defaults. I don't see how it matters though. The issue is that the kernel clamps rcv_sshtrehsh at 4 x advmss. No matter how much TCP memory you end up using, the kernel will clamp based on responsiveness to memory reclaim, which in turn depends on swap presence. We're seeing it in production with tens of thousands of sockets and high max tcp_rmem and I'm able to replicate the same issue in my vm with the default sysctl values. > If for your workload you want to ensure a minimum amount of memory per > TCP socket, > that might be good enough. That's not my goal at all. We don't have a problem with TCP memory consumption. Our issue is low throughput because vmpressure() thinks that the cgroup is memory constrained when it most definitely is not.