All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Ivan Babrou <ivan-lDpJ742SOEtZroRs9YW3xA@public.gmane.org>
Cc: Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Linux Kernel Network Developers
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Roman Gushchin
	<roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Eric Dumazet <edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	"David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
	Hideaki YOSHIFUJI
	<yoshfuji-VfPWfsRibaP+Ru+s062T9g@public.gmane.org>,
	David Ahern <dsahern-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Jakub Kicinski <kuba-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Paolo Abeni <pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-team <kernel-team-lDpJ742SOEtZroRs9YW3xA@public.gmane.org>
Subject: Re: Low TCP throughput due to vmpressure with swap enabled
Date: Mon, 28 Nov 2022 13:07:25 -0500	[thread overview]
Message-ID: <Y4T43Tc54vlKjTN0@cmpxchg.org> (raw)
In-Reply-To: <CABWYdi0qhWs56WK=k+KoQBAMh+Tb6Rr0nY4kJN+E5YqfGhKTmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, Nov 22, 2022 at 05:28:24PM -0800, Ivan Babrou wrote:
> On Tue, Nov 22, 2022 at 2:11 PM Ivan Babrou <ivan-lDpJ742SOEtZroRs9YW3xA@public.gmane.org> wrote:
> >
> > On Tue, Nov 22, 2022 at 12:05 PM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> > >
> > > On Mon, Nov 21, 2022 at 04:53:43PM -0800, Ivan Babrou wrote:
> > > > Hello,
> > > >
> > > > We have observed a negative TCP throughput behavior from the following commit:
> > > >
> > > > * 8e8ae645249b mm: memcontrol: hook up vmpressure to socket pressure
> > > >
> > > > It landed back in 2016 in v4.5, so it's not exactly a new issue.
> > > >
> > > > The crux of the issue is that in some cases with swap present the
> > > > workload can be unfairly throttled in terms of TCP throughput.
> > >
> > > Thanks for the detailed analysis, Ivan.
> > >
> > > Originally, we pushed back on sockets only when regular page reclaim
> > > had completely failed and we were about to OOM. This patch was an
> > > attempt to be smarter about it and equalize pressure more smoothly
> > > between socket memory, file cache, anonymous pages.
> > >
> > > After a recent discussion with Shakeel, I'm no longer quite sure the
> > > kernel is the right place to attempt this sort of balancing. It kind
> > > of depends on the workload which type of memory is more imporant. And
> > > your report shows that vmpressure is a flawed mechanism to implement
> > > this, anyway.
> > >
> > > So I'm thinking we should delete the vmpressure thing, and go back to
> > > socket throttling only if an OOM is imminent. This is in line with
> > > what we do at the system level: sockets get throttled only after
> > > reclaim fails and we hit hard limits. It's then up to the users and
> > > sysadmin to allocate a reasonable amount of buffers given the overall
> > > memory budget.
> > >
> > > Cgroup accounting, limiting and OOM enforcement is still there for the
> > > socket buffers, so misbehaving groups will be contained either way.
> > >
> > > What do you think? Something like the below patch?
> >
> > The idea sounds very reasonable to me. I can't really speak for the
> > patch contents with any sort of authority, but it looks ok to my
> > non-expert eyes.
> >
> > There were some conflicts when cherry-picking this into v5.15. I think
> > the only real one was for the "!sc->proactive" condition not being
> > present there. For the rest I just accepted the incoming change.
> >
> > I'm going to be away from my work computer until December 5th, but
> > I'll try to expedite my backported patch to a production machine today
> > to confirm that it makes the difference. If I can get some approvals
> > on my internal PRs, I should be able to provide the results by EOD
> > tomorrow.
> 
> I tried the patch and something isn't right here.

Thanks for giving it a sping.

> With the patch applied I'm capped at ~120MB/s, which is a symptom of a
> clamped window.
> 
> I can't find any sockets with memcg->socket_pressure = 1, but at the
> same time I only see the following rcv_ssthresh assigned to sockets:

Hm, I don't see how socket accounting would alter the network behavior
other than through socket_pressure=1.

How do you look for that flag? If you haven't yet done something
comparable, can you try with tracing to rule out sampling errors?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 066166aebbef..134b623bee6a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7211,6 +7211,7 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
 		goto success;
 	}
 	memcg->socket_pressure = 1;
+	trace_printk("skmem charge failed nr_pages=%u gfp=%pGg\n", nr_pages, &gfp_mask);
 	if (gfp_mask & __GFP_NOFAIL) {
 		try_charge(memcg, gfp_mask, nr_pages);
 		goto success;

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Ivan Babrou <ivan@cloudflare.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	cgroups@vger.kernel.org, kernel-team <kernel-team@cloudflare.com>
Subject: Re: Low TCP throughput due to vmpressure with swap enabled
Date: Mon, 28 Nov 2022 13:07:25 -0500	[thread overview]
Message-ID: <Y4T43Tc54vlKjTN0@cmpxchg.org> (raw)
In-Reply-To: <CABWYdi0qhWs56WK=k+KoQBAMh+Tb6Rr0nY4kJN+E5YqfGhKTmQ@mail.gmail.com>

On Tue, Nov 22, 2022 at 05:28:24PM -0800, Ivan Babrou wrote:
> On Tue, Nov 22, 2022 at 2:11 PM Ivan Babrou <ivan@cloudflare.com> wrote:
> >
> > On Tue, Nov 22, 2022 at 12:05 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Mon, Nov 21, 2022 at 04:53:43PM -0800, Ivan Babrou wrote:
> > > > Hello,
> > > >
> > > > We have observed a negative TCP throughput behavior from the following commit:
> > > >
> > > > * 8e8ae645249b mm: memcontrol: hook up vmpressure to socket pressure
> > > >
> > > > It landed back in 2016 in v4.5, so it's not exactly a new issue.
> > > >
> > > > The crux of the issue is that in some cases with swap present the
> > > > workload can be unfairly throttled in terms of TCP throughput.
> > >
> > > Thanks for the detailed analysis, Ivan.
> > >
> > > Originally, we pushed back on sockets only when regular page reclaim
> > > had completely failed and we were about to OOM. This patch was an
> > > attempt to be smarter about it and equalize pressure more smoothly
> > > between socket memory, file cache, anonymous pages.
> > >
> > > After a recent discussion with Shakeel, I'm no longer quite sure the
> > > kernel is the right place to attempt this sort of balancing. It kind
> > > of depends on the workload which type of memory is more imporant. And
> > > your report shows that vmpressure is a flawed mechanism to implement
> > > this, anyway.
> > >
> > > So I'm thinking we should delete the vmpressure thing, and go back to
> > > socket throttling only if an OOM is imminent. This is in line with
> > > what we do at the system level: sockets get throttled only after
> > > reclaim fails and we hit hard limits. It's then up to the users and
> > > sysadmin to allocate a reasonable amount of buffers given the overall
> > > memory budget.
> > >
> > > Cgroup accounting, limiting and OOM enforcement is still there for the
> > > socket buffers, so misbehaving groups will be contained either way.
> > >
> > > What do you think? Something like the below patch?
> >
> > The idea sounds very reasonable to me. I can't really speak for the
> > patch contents with any sort of authority, but it looks ok to my
> > non-expert eyes.
> >
> > There were some conflicts when cherry-picking this into v5.15. I think
> > the only real one was for the "!sc->proactive" condition not being
> > present there. For the rest I just accepted the incoming change.
> >
> > I'm going to be away from my work computer until December 5th, but
> > I'll try to expedite my backported patch to a production machine today
> > to confirm that it makes the difference. If I can get some approvals
> > on my internal PRs, I should be able to provide the results by EOD
> > tomorrow.
> 
> I tried the patch and something isn't right here.

Thanks for giving it a sping.

> With the patch applied I'm capped at ~120MB/s, which is a symptom of a
> clamped window.
> 
> I can't find any sockets with memcg->socket_pressure = 1, but at the
> same time I only see the following rcv_ssthresh assigned to sockets:

Hm, I don't see how socket accounting would alter the network behavior
other than through socket_pressure=1.

How do you look for that flag? If you haven't yet done something
comparable, can you try with tracing to rule out sampling errors?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 066166aebbef..134b623bee6a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7211,6 +7211,7 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
 		goto success;
 	}
 	memcg->socket_pressure = 1;
+	trace_printk("skmem charge failed nr_pages=%u gfp=%pGg\n", nr_pages, &gfp_mask);
 	if (gfp_mask & __GFP_NOFAIL) {
 		try_charge(memcg, gfp_mask, nr_pages);
 		goto success;


  parent reply	other threads:[~2022-11-28 18:07 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-22  0:53 Low TCP throughput due to vmpressure with swap enabled Ivan Babrou
2022-11-22  0:53 ` Ivan Babrou
2022-11-22 18:01 ` Eric Dumazet
     [not found]   ` <CANn89iLzARPp6jW1xS0rf+-wS_RnwK-Kfgs9uQFYan2AHPRQFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-22 18:11     ` Ivan Babrou
2022-11-22 18:11       ` Ivan Babrou
     [not found]       ` <CABWYdi2TWJej806yif9hi7cxD9P9-EpMB9EU_72wWw9fFqtt4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-22 18:23         ` Eric Dumazet
2022-11-22 18:23           ` Eric Dumazet
     [not found] ` <CABWYdi0G7cyNFbndM-ELTDAR3x4Ngm0AehEp5aP0tfNkXUE+Uw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-22 18:59   ` Yu Zhao
2022-11-22 18:59     ` Yu Zhao
     [not found]     ` <CAOUHufbQ_JjW=zXEi10+=LQOREOPHrK66Rqayr=sFUH_tQbW1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-22 19:05       ` Ivan Babrou
2022-11-22 19:05         ` Ivan Babrou
     [not found]         ` <CABWYdi3aOtJuMe4Z=FFzBb3iR6Cc9k8G2swSuZ_GDnaESuE_EQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-22 19:08           ` Yu Zhao
2022-11-22 19:08             ` Yu Zhao
2022-11-22 19:46   ` Yu Zhao
2022-11-22 19:46     ` Yu Zhao
2022-11-22 20:05     ` Yu Zhao
     [not found]       ` <CAOUHufYSeTeO5ZMpnCR781esHV4QV5Th+pd=52UaM9cXNNKF9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-23  0:44         ` Yu Zhao
2022-11-23  0:44           ` Yu Zhao
2022-11-23 21:22           ` Johannes Weiner
     [not found]             ` <Y36PF972kOK3ADvx-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-24  1:18               ` Yu Zhao
2022-11-24  1:18                 ` Yu Zhao
2022-11-24  1:29                 ` Yu Zhao
2022-11-22 20:05   ` Johannes Weiner
2022-11-22 20:05     ` Johannes Weiner
2022-11-22 22:11     ` Ivan Babrou
2022-11-23  1:28       ` Ivan Babrou
     [not found]         ` <CABWYdi0qhWs56WK=k+KoQBAMh+Tb6Rr0nY4kJN+E5YqfGhKTmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-28 18:07           ` Johannes Weiner [this message]
2022-11-28 18:07             ` Johannes Weiner
     [not found]             ` <Y4T43Tc54vlKjTN0-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-12-05 19:28               ` Shakeel Butt
2022-12-05 19:28                 ` Shakeel Butt
2022-12-05 23:57               ` Ivan Babrou
2022-12-05 23:57                 ` Ivan Babrou
     [not found]                 ` <CABWYdi0z6-46PrNWumSXWki6Xf4G_EP1Nvc-2t00nEi0PiOU3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-12-06  0:50                   ` Ivan Babrou
2022-12-06  0:50                     ` Ivan Babrou
2022-12-06 19:00                     ` Johannes Weiner
     [not found]                       ` <Y4+RPry2tfbWFdSA-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-12-06 19:13                         ` Eric Dumazet
2022-12-06 19:13                           ` Eric Dumazet
     [not found]                           ` <CANn89iJfx4QdVBqJ23oFJoz5DJKou=ZwVBNNXFNDJRNAqNvzwQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-12-06 20:51                             ` Johannes Weiner
2022-12-06 20:51                               ` Johannes Weiner
     [not found]                               ` <Y4+rNYF9WZyJyBQp-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-12-06 23:10                                 ` Shakeel Butt
2022-12-06 23:10                                   ` Shakeel Butt
2022-12-07 12:53                                   ` Johannes Weiner
2022-12-08  0:31                                     ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4T43Tc54vlKjTN0@cmpxchg.org \
    --to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
    --cc=dsahern-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=ivan-lDpJ742SOEtZroRs9YW3xA@public.gmane.org \
    --cc=kernel-team-lDpJ742SOEtZroRs9YW3xA@public.gmane.org \
    --cc=kuba-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org \
    --cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
    --cc=yoshfuji-VfPWfsRibaP+Ru+s062T9g@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.