Re: Regression from 5.7.17 to 5.9.9 with memory.low cgroup constraints

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Bruno Prémont" <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org>
To: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Cc: Yafang Shao <laoar.shao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: Regression from 5.7.17 to 5.9.9 with memory.low cgroup constraints
Date: Wed, 25 Nov 2020 15:33:50 +0100	[thread overview]
Message-ID: <20201125153350.0af98d93@hemera> (raw)
In-Reply-To: <20201125133740.GE31550-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

Hi Michal,

On Wed, 25 Nov 2020 14:37:40 +0100 Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
> Hi,
> thanks for the detailed report.
> 
> On Wed 25-11-20 12:39:56, Bruno PrÃ©mont wrote:
> [...]
> > Did memory.low meaning change between 5.7 and 5.9?  
> 
> The latest semantic change in the low limit protection semantic was
> introduced in 5.7 (recursive protection) but it requires an explicit
> enablinig.

No specific mount options set for v2 cgroup, so not active.

> > From behavior it
> > feels as if inodes are not accounted to cgroup at all and kernel pushes
> > cgroups down to their memory.low by killing file cache if there is not
> > enough free memory to hold all promises (and not only when a cgroup
> > tries to use up to its promised amount of memory).  
> 
> Your counters indeed show that the low protection has been breached,
> most likely because the reclaim couldn't make any progress. Considering
> that this is the case for all/most of your cgroups it suggests that the
> memory pressure was global rather than limit imposed. In fact even top
> level cgroups got reclaimed below the low limit.

Note that the "original" counters we partially triggered by a first
event where I had one cgroup (websrv) of the with a rather very high
memory.low (16G or even 32G) which caused counters everywhere to
increase.


So before the last trashing during which the values were collected the
event counters and `current` looked as follows:

system/memory.pressure
  some avg10=0.04 avg60=0.28 avg300=0.12 total=5844917510
  full avg10=0.04 avg60=0.26 avg300=0.11 total=2439353404
system/memory.current
  96432128
system/memory.events.local
  low      5399469   (unchanged)
  high     0
  max      112303    (unchanged)
  oom      0
  oom_kill 0

system/base/memory.pressure
  some avg10=0.04 avg60=0.28 avg300=0.12 total=4589562039
  full avg10=0.04 avg60=0.28 avg300=0.12 total=1926984197
system/base/memory.current
  59305984
system/base/memory.events.local
  low      0   (unchanged)
  high     0
  max      0   (unchanged)
  oom      0
  oom_kill 0

system/backup/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=2123293649
  full avg10=0.00 avg60=0.00 avg300=0.00 total=815450446
system/backup/memory.current
  32444416
system/backup/memory.events.local
  low      5446   (unchanged)
  high     0
  max      0
  oom      0
  oom_kill 0

system/shell/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=1345965660
  full avg10=0.00 avg60=0.00 avg300=0.00 total=492812915
system/shell/memory.current
  4571136
system/shell/memory.events.local
  low      0
  high     0
  max      0
  oom      0
  oom_kill 0

website/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=415008878
  full avg10=0.00 avg60=0.00 avg300=0.00 total=201868483
website/memory.current
  12104380416
website/memory.events.local
  low      11264569  (during trashing: 11372142 then 11377350)
  high     0
  max      0
  oom      0
  oom_kill 0

remote/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=2005130126
  full avg10=0.00 avg60=0.00 avg300=0.00 total=735366752
remote/memory.current
  116330496
remote/memory.events.local
  low      11264569  (during trashing: 11372142 then 11377350)
  high     0
  max      0
  oom      0
  oom_kill 0

websrv/memory.pressure
  some avg10=0.02 avg60=0.11 avg300=0.03 total=6650355162
  full avg10=0.02 avg60=0.11 avg300=0.03 total=2034584579
websrv/memory.current
  18483359744
websrv/memory.events.local
  low      0
  high     0
  max      0
  oom      0
  oom_kill 0


> This suggests that this is not likely to be memcg specific. It is
> more likely that this is a general memory reclaim regression for your
> workload. There were larger changes in that area. Be it lru balancing
> based on cost model by Johannes or working set tracking for anonymous
> pages by Joonsoo. Maybe even more. Both of them can influence page cache
> reclaim but you are suggesting that slab accounted memory is not
> reclaimed properly.

That is my impression, yes. No idea though if memcg can influence the
way reclaim tries to perform its work or if slab_reclaimable not
associated to any (child) cg would somehow be excluded from reclaim.

> I am not sure sure there were considerable changes
> there. Would it be possible to collect /prov/vmstat as well?

I will have a look at gathering memory.stat and /proc/vmstat at next
opportunity.
Will first try with a test system with not too much memory and lots of
files to reproduce about 50% of memory usage by slab_reclaimable and
see how far I get.

Thanks,
Bruno

WARNING: multiple messages have this Message-ID (diff)

From: "Bruno Prémont" <bonbons@linux-vserver.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Yafang Shao <laoar.shao@gmail.com>,
	Chris Down <chris@chrisdown.name>,
	Johannes Weiner <hannes@cmpxchg.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: Regression from 5.7.17 to 5.9.9 with memory.low cgroup constraints
Date: Wed, 25 Nov 2020 15:33:50 +0100	[thread overview]
Message-ID: <20201125153350.0af98d93@hemera> (raw)
In-Reply-To: <20201125133740.GE31550@dhcp22.suse.cz>

Hi Michal,

On Wed, 25 Nov 2020 14:37:40 +0100 Michal Hocko <mhocko@suse.com> wrote:
> Hi,
> thanks for the detailed report.
> 
> On Wed 25-11-20 12:39:56, Bruno Prémont wrote:
> [...]
> > Did memory.low meaning change between 5.7 and 5.9?  
> 
> The latest semantic change in the low limit protection semantic was
> introduced in 5.7 (recursive protection) but it requires an explicit
> enablinig.

No specific mount options set for v2 cgroup, so not active.

> > From behavior it
> > feels as if inodes are not accounted to cgroup at all and kernel pushes
> > cgroups down to their memory.low by killing file cache if there is not
> > enough free memory to hold all promises (and not only when a cgroup
> > tries to use up to its promised amount of memory).  
> 
> Your counters indeed show that the low protection has been breached,
> most likely because the reclaim couldn't make any progress. Considering
> that this is the case for all/most of your cgroups it suggests that the
> memory pressure was global rather than limit imposed. In fact even top
> level cgroups got reclaimed below the low limit.

Note that the "original" counters we partially triggered by a first
event where I had one cgroup (websrv) of the with a rather very high
memory.low (16G or even 32G) which caused counters everywhere to
increase.


So before the last trashing during which the values were collected the
event counters and `current` looked as follows:

system/memory.pressure
  some avg10=0.04 avg60=0.28 avg300=0.12 total=5844917510
  full avg10=0.04 avg60=0.26 avg300=0.11 total=2439353404
system/memory.current
  96432128
system/memory.events.local
  low      5399469   (unchanged)
  high     0
  max      112303    (unchanged)
  oom      0
  oom_kill 0

system/base/memory.pressure
  some avg10=0.04 avg60=0.28 avg300=0.12 total=4589562039
  full avg10=0.04 avg60=0.28 avg300=0.12 total=1926984197
system/base/memory.current
  59305984
system/base/memory.events.local
  low      0   (unchanged)
  high     0
  max      0   (unchanged)
  oom      0
  oom_kill 0

system/backup/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=2123293649
  full avg10=0.00 avg60=0.00 avg300=0.00 total=815450446
system/backup/memory.current
  32444416
system/backup/memory.events.local
  low      5446   (unchanged)
  high     0
  max      0
  oom      0
  oom_kill 0

system/shell/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=1345965660
  full avg10=0.00 avg60=0.00 avg300=0.00 total=492812915
system/shell/memory.current
  4571136
system/shell/memory.events.local
  low      0
  high     0
  max      0
  oom      0
  oom_kill 0

website/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=415008878
  full avg10=0.00 avg60=0.00 avg300=0.00 total=201868483
website/memory.current
  12104380416
website/memory.events.local
  low      11264569  (during trashing: 11372142 then 11377350)
  high     0
  max      0
  oom      0
  oom_kill 0

remote/memory.pressure
  some avg10=0.00 avg60=0.00 avg300=0.00 total=2005130126
  full avg10=0.00 avg60=0.00 avg300=0.00 total=735366752
remote/memory.current
  116330496
remote/memory.events.local
  low      11264569  (during trashing: 11372142 then 11377350)
  high     0
  max      0
  oom      0
  oom_kill 0

websrv/memory.pressure
  some avg10=0.02 avg60=0.11 avg300=0.03 total=6650355162
  full avg10=0.02 avg60=0.11 avg300=0.03 total=2034584579
websrv/memory.current
  18483359744
websrv/memory.events.local
  low      0
  high     0
  max      0
  oom      0
  oom_kill 0


> This suggests that this is not likely to be memcg specific. It is
> more likely that this is a general memory reclaim regression for your
> workload. There were larger changes in that area. Be it lru balancing
> based on cost model by Johannes or working set tracking for anonymous
> pages by Joonsoo. Maybe even more. Both of them can influence page cache
> reclaim but you are suggesting that slab accounted memory is not
> reclaimed properly.

That is my impression, yes. No idea though if memcg can influence the
way reclaim tries to perform its work or if slab_reclaimable not
associated to any (child) cg would somehow be excluded from reclaim.

> I am not sure sure there were considerable changes
> there. Would it be possible to collect /prov/vmstat as well?

I will have a look at gathering memory.stat and /proc/vmstat at next
opportunity.
Will first try with a test system with not too much memory and lots of
files to reproduce about 50% of memory usage by slab_reclaimable and
see how far I get.

Thanks,
Bruno

next prev parent reply	other threads:[~2020-11-25 14:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-25 11:39 Regression from 5.7.17 to 5.9.9 with memory.low cgroup constraints Bruno Prémont
2020-11-25 11:39 ` Bruno Prémont
2020-11-25 13:37 ` Michal Hocko
2020-11-25 13:37   ` Michal Hocko
     [not found]   ` <20201125133740.GE31550-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2020-11-25 14:33     ` Bruno Prémont [this message]
2020-11-25 14:33       ` Bruno Prémont
2020-11-25 18:21 ` Roman Gushchin
2020-11-25 18:21   ` Roman Gushchin
     [not found]   ` <20201125182103.GA840171-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2020-12-03 11:09     ` Bruno Prémont
2020-12-03 11:09       ` Bruno Prémont
     [not found]       ` <20201203120936.4cadef43-pDZhbqX7CfkoGc32E1+a2S4z1YicLaQ4@public.gmane.org>
2020-12-03 20:55         ` Roman Gushchin
2020-12-03 20:55           ` Roman Gushchin
     [not found]           ` <20201203205559.GD1571588-lLJQVQxiE4uLfgCeKHXN1g2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2020-12-06 11:30             ` Bruno Prémont
2020-12-06 11:30               ` Bruno Prémont
     [not found]               ` <20201206123021.6683e2a5-pDZhbqX7CfkoGc32E1+a2S4z1YicLaQ4@public.gmane.org>
2020-12-10 11:08                 ` Bruno Prémont
2020-12-10 11:08                   ` Bruno Prémont

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201125153350.0af98d93@hemera \
    --to=bonbons-ud5fbsm0p/xeiooadzr8i9i2o/jbrioy@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=laoar.shao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-IBi9RG/b67k@public.gmane.org \
    --cc=vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.