From: Dave Chinner <david@fromorbit.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>, Linux MM <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function
Date: Sun, 5 Jan 2020 08:23:31 +1100 [thread overview]
Message-ID: <20200104212331.GG23195@dread.disaster.area> (raw)
In-Reply-To: <CALOAHbAzDth8g8+Z5hH9QnOp02UZ5+3eQf9wAQyJM-LAhmaL9A@mail.gmail.com>
On Sat, Jan 04, 2020 at 03:26:13PM +0800, Yafang Shao wrote:
> On Sat, Jan 4, 2020 at 11:36 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Tue, Dec 24, 2019 at 02:53:25AM -0500, Yafang Shao wrote:
> > > The lru walker isolation function may use this memcg to do something, e.g.
> > > the inode isolatation function will use the memcg to do inode protection in
> > > followup patch. So make memcg visible to the lru walker isolation function.
> > >
> > > Something should be emphasized in this patch is it replaces
> > > for_each_memcg_cache_index() with for_each_mem_cgroup() in
> > > list_lru_walk_node(). Because there's a gap between these two MACROs that
> > > for_each_mem_cgroup() depends on CONFIG_MEMCG while the other one depends
> > > on CONFIG_MEMCG_KMEM. But as list_lru_memcg_aware() returns false if
> > > CONFIG_MEMCG_KMEM is not configured, it is safe to this replacement.
> > >
> > > Cc: Dave Chinner <dchinner@redhat.com>
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >
> > ....
> >
> > > @@ -299,17 +299,15 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
> > > list_lru_walk_cb isolate, void *cb_arg,
> > > unsigned long *nr_to_walk)
> > > {
> > > + struct mem_cgroup *memcg;
> > > long isolated = 0;
> > > - int memcg_idx;
> > >
> > > - isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> > > - nr_to_walk);
> > > - if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
> > > - for_each_memcg_cache_index(memcg_idx) {
> > > + if (list_lru_memcg_aware(lru)) {
> > > + for_each_mem_cgroup(memcg) {
> > > struct list_lru_node *nlru = &lru->node[nid];
> > >
> > > spin_lock(&nlru->lock);
> > > - isolated += __list_lru_walk_one(nlru, memcg_idx,
> > > + isolated += __list_lru_walk_one(nlru, memcg,
> > > isolate, cb_arg,
> > > nr_to_walk);
> > > spin_unlock(&nlru->lock);
> > > @@ -317,7 +315,11 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
> > > if (*nr_to_walk <= 0)
> > > break;
> > > }
> > > + } else {
> > > + isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> > > + nr_to_walk);
> > > }
> > > +
> >
> > That's a change of behaviour. The old code always runs per-node
> > reclaim, then if the LRU is memcg aware it also runs the memcg
> > aware reclaim. The new code never runs global per-node reclaim
> > if the list is memcg aware, so shrinkers that are initialised
> > with the flags SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE seem
> > likely to have reclaim problems with mixed memcg/global memory
> > pressure scenarios.
> >
> > e.g. if all the memory is in the per-node lists, and the memcg needs
> > to reclaim memory because of a global shortage, it is now unable to
> > reclaim global memory.....
> >
>
> Hi Dave,
>
> Thanks for your detailed explanation.
> But I have different understanding.
> The difference between for_each_mem_cgroup(memcg) and
> for_each_memcg_cache_index(memcg_idx) is that the
> for_each_mem_cgroup() includes the root_mem_cgroup while the
> for_each_memcg_cache_index() excludes the root_mem_cgroup because the
> memcg_idx of it is -1.
Except that the "root" memcg that for_each_mem_cgroup() is not the
"global root" memcg - it is whatever memcg that is passed down in
the shrink_control, whereever that sits in the cgroup tree heirarchy.
do_shrink_slab() only ever passes down the global root memcg to the
shrinkers when the global root memcg is passed to shrink_slab(), and
that does not iterate the memcg heirarchy - it just wants to
reclaim from global caches an non-memcg aware shrinkers.
IOWs, there are multiple changes in behaviour here - memcg specific
reclaim won't do global reclaim, and global reclaim will now iterate
all memcgs instead of just the global root memcg.
> So it can reclaim global memory even if the list is memcg aware.
> Is that right ?
If the memcg passed to this fucntion is the root memcg, then yes,
it will behave as you suggest. But for the majority of memcg-context
reclaim, the memcg is not the root memcg and so they will not do
global reclaim anymore...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2020-01-04 21:23 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-24 7:53 [PATCH v2 0/5] protect page cache from freeing inode Yafang Shao
2019-12-24 7:53 ` [PATCH v2 1/5] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-26 21:23 ` Roman Gushchin
2019-12-27 1:03 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 2/5] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Yafang Shao
2019-12-26 21:36 ` Roman Gushchin
2019-12-27 1:09 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 3/5] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-26 21:45 ` Roman Gushchin
2019-12-27 1:11 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function Yafang Shao
2020-01-04 3:35 ` Dave Chinner
2020-01-04 7:26 ` Yafang Shao
2020-01-04 21:23 ` Dave Chinner [this message]
2020-01-05 1:43 ` Yafang Shao
2020-01-06 0:17 ` Dave Chinner
2020-01-06 14:41 ` Yafang Shao
2020-01-06 21:31 ` Dave Chinner
2020-01-07 13:22 ` Yafang Shao
2019-12-24 7:53 ` [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode Yafang Shao
2019-12-25 13:01 ` kbuild test robot
2019-12-25 13:01 ` kbuild test robot
2019-12-25 13:18 ` kbuild test robot
2019-12-25 13:18 ` kbuild test robot
2019-12-26 5:09 ` Yafang Shao
2020-01-04 3:55 ` Dave Chinner
2020-01-04 7:42 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200104212331.GG23195@dread.disaster.area \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=laoar.shao@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.