From: Johannes Weiner <jweiner@redhat.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Balbir Singh <bsingharora@gmail.com>,
Ying Han <yinghan@google.com>, Michal Hocko <mhocko@suse.cz>,
Greg Thelen <gthelen@google.com>,
Michel Lespinasse <walken@google.com>,
Rik van Riel <riel@redhat.com>,
Minchan Kim <minchan.kim@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 04/11] mm: memcg: per-priority per-zone hierarchy scan generations
Date: Wed, 14 Sep 2011 07:56:34 +0200 [thread overview]
Message-ID: <20110914055634.GA28051@redhat.com> (raw)
In-Reply-To: <20110914095504.30fca5d0.kamezawa.hiroyu@jp.fujitsu.com>
On Wed, Sep 14, 2011 at 09:55:04AM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 13 Sep 2011 13:03:01 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
>
> > On Tue, Sep 13, 2011 at 07:27:59PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 12 Sep 2011 12:57:21 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > >
> > > > Memory cgroup limit reclaim currently picks one memory cgroup out of
> > > > the target hierarchy, remembers it as the last scanned child, and
> > > > reclaims all zones in it with decreasing priority levels.
> > > >
> > > > The new hierarchy reclaim code will pick memory cgroups from the same
> > > > hierarchy concurrently from different zones and priority levels, it
> > > > becomes necessary that hierarchy roots not only remember the last
> > > > scanned child, but do so for each zone and priority level.
> > > >
> > > > Furthermore, detecting full hierarchy round-trips reliably will become
> > > > crucial, so instead of counting on one iterator site seeing a certain
> > > > memory cgroup twice, use a generation counter that is increased every
> > > > time the child with the highest ID has been visited.
> > > >
> > > > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> > >
> > > I cannot image how this works. could you illustrate more with easy example ?
> >
> > Previously, we did
> >
> > mem = mem_cgroup_iter(root)
> > for each priority level:
> > for each zone in zonelist:
> >
> > and this would reclaim memcg-1-zone-1, memcg-1-zone-2, memcg-1-zone-3
> > etc.
> >
> yes.
>
> > The new code does
> >
> > for each priority level
> > for each zone in zonelist
> > mem = mem_cgroup_iter(root)
> >
> > but with a single last_scanned_child per memcg, this would scan
> > memcg-1-zone-1, memcg-2-zone-2, memcg-3-zone-3 etc, which does not
> > make much sense.
> >
> > Now imagine two reclaimers. With the old code, the first reclaimer
> > would pick memcg-1 and scan all its zones, the second reclaimer would
> > pick memcg-2 and reclaim all its zones. Without this patch, the first
> > reclaimer would pick memcg-1 and scan zone-1, the second reclaimer
> > would pick memcg-2 and scan zone-1, then the first reclaimer would
> > pick memcg-3 and scan zone-2. If the reclaimers are concurrently
> > scanning at different priority levels, things are even worse because
> > one reclaimer may put much more force on the memcgs it gets from
> > mem_cgroup_iter() than the other reclaimer. They must not share the
> > same iterator.
> >
> > The generations are needed because the old algorithm did not rely too
> > much on detecting full round-trips. After every reclaim cycle, it
> > checked the limit and broke out of the loop if enough was reclaimed,
> > no matter how many children were reclaimed from. The new algorithm is
> > used for global reclaim, where the only exit condition of the
> > hierarchy reclaim is the full roundtrip, because equal pressure needs
> > to be applied to all zones.
> >
> Hm, ok, maybe good for global reclam.
> Is this used for both of reclaim-by-limit and global-reclaim ?
No, the hierarchy iteration in shrink_zone() is done after a single
memcg, which is equivalent to the old code: scan all zones at all
priority levels from a memcg, then move on to the next memcg. This
also works because of the per-zone per-priority last_scanned_child:
for each priority
for each zone
mem = mem_cgroup_iter(root)
scan(mem)
priority-12 + zone-1 will yield memcg-1. priority-12 + zone-2 starts
at its own last_scanned_child, so yields memcg-1 as well, etc. A
second reclaimer that comes in with priority-12 + zone-1 will receive
memcg-2 for scanning. So there is no change in behaviour for limit
reclaim.
> If so, I need to abandon node-selection-logic for reclaim-by-limit
> and nodemask-for-memcg which shows me very good result.
> I'll be sad ;)
With my clarification, do you still think so?
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <jweiner@redhat.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Balbir Singh <bsingharora@gmail.com>,
Ying Han <yinghan@google.com>, Michal Hocko <mhocko@suse.cz>,
Greg Thelen <gthelen@google.com>,
Michel Lespinasse <walken@google.com>,
Rik van Riel <riel@redhat.com>,
Minchan Kim <minchan.kim@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 04/11] mm: memcg: per-priority per-zone hierarchy scan generations
Date: Wed, 14 Sep 2011 07:56:34 +0200 [thread overview]
Message-ID: <20110914055634.GA28051@redhat.com> (raw)
In-Reply-To: <20110914095504.30fca5d0.kamezawa.hiroyu@jp.fujitsu.com>
On Wed, Sep 14, 2011 at 09:55:04AM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 13 Sep 2011 13:03:01 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
>
> > On Tue, Sep 13, 2011 at 07:27:59PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 12 Sep 2011 12:57:21 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > >
> > > > Memory cgroup limit reclaim currently picks one memory cgroup out of
> > > > the target hierarchy, remembers it as the last scanned child, and
> > > > reclaims all zones in it with decreasing priority levels.
> > > >
> > > > The new hierarchy reclaim code will pick memory cgroups from the same
> > > > hierarchy concurrently from different zones and priority levels, it
> > > > becomes necessary that hierarchy roots not only remember the last
> > > > scanned child, but do so for each zone and priority level.
> > > >
> > > > Furthermore, detecting full hierarchy round-trips reliably will become
> > > > crucial, so instead of counting on one iterator site seeing a certain
> > > > memory cgroup twice, use a generation counter that is increased every
> > > > time the child with the highest ID has been visited.
> > > >
> > > > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> > >
> > > I cannot image how this works. could you illustrate more with easy example ?
> >
> > Previously, we did
> >
> > mem = mem_cgroup_iter(root)
> > for each priority level:
> > for each zone in zonelist:
> >
> > and this would reclaim memcg-1-zone-1, memcg-1-zone-2, memcg-1-zone-3
> > etc.
> >
> yes.
>
> > The new code does
> >
> > for each priority level
> > for each zone in zonelist
> > mem = mem_cgroup_iter(root)
> >
> > but with a single last_scanned_child per memcg, this would scan
> > memcg-1-zone-1, memcg-2-zone-2, memcg-3-zone-3 etc, which does not
> > make much sense.
> >
> > Now imagine two reclaimers. With the old code, the first reclaimer
> > would pick memcg-1 and scan all its zones, the second reclaimer would
> > pick memcg-2 and reclaim all its zones. Without this patch, the first
> > reclaimer would pick memcg-1 and scan zone-1, the second reclaimer
> > would pick memcg-2 and scan zone-1, then the first reclaimer would
> > pick memcg-3 and scan zone-2. If the reclaimers are concurrently
> > scanning at different priority levels, things are even worse because
> > one reclaimer may put much more force on the memcgs it gets from
> > mem_cgroup_iter() than the other reclaimer. They must not share the
> > same iterator.
> >
> > The generations are needed because the old algorithm did not rely too
> > much on detecting full round-trips. After every reclaim cycle, it
> > checked the limit and broke out of the loop if enough was reclaimed,
> > no matter how many children were reclaimed from. The new algorithm is
> > used for global reclaim, where the only exit condition of the
> > hierarchy reclaim is the full roundtrip, because equal pressure needs
> > to be applied to all zones.
> >
> Hm, ok, maybe good for global reclam.
> Is this used for both of reclaim-by-limit and global-reclaim ?
No, the hierarchy iteration in shrink_zone() is done after a single
memcg, which is equivalent to the old code: scan all zones at all
priority levels from a memcg, then move on to the next memcg. This
also works because of the per-zone per-priority last_scanned_child:
for each priority
for each zone
mem = mem_cgroup_iter(root)
scan(mem)
priority-12 + zone-1 will yield memcg-1. priority-12 + zone-2 starts
at its own last_scanned_child, so yields memcg-1 as well, etc. A
second reclaimer that comes in with priority-12 + zone-1 will receive
memcg-2 for scanning. So there is no change in behaviour for limit
reclaim.
> If so, I need to abandon node-selection-logic for reclaim-by-limit
> and nodemask-for-memcg which shows me very good result.
> I'll be sad ;)
With my clarification, do you still think so?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-09-14 5:57 UTC|newest]
Thread overview: 130+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-12 10:57 [patch 0/11] mm: memcg naturalization -rc3 Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-12 10:57 ` [patch 01/11] mm: memcg: consolidate hierarchy iteration primitives Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-12 22:37 ` Kirill A. Shutemov
2011-09-12 22:37 ` Kirill A. Shutemov
2011-09-13 5:40 ` Johannes Weiner
2011-09-13 5:40 ` Johannes Weiner
2011-09-19 13:06 ` Michal Hocko
2011-09-19 13:06 ` Michal Hocko
2011-09-13 10:06 ` KAMEZAWA Hiroyuki
2011-09-13 10:06 ` KAMEZAWA Hiroyuki
2011-09-19 12:53 ` Michal Hocko
2011-09-19 12:53 ` Michal Hocko
2011-09-20 8:45 ` Johannes Weiner
2011-09-20 8:45 ` Johannes Weiner
2011-09-20 8:53 ` Michal Hocko
2011-09-20 8:53 ` Michal Hocko
2011-09-12 10:57 ` [patch 02/11] mm: vmscan: distinguish global reclaim from global LRU scanning Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-12 23:02 ` Kirill A. Shutemov
2011-09-12 23:02 ` Kirill A. Shutemov
2011-09-13 5:48 ` Johannes Weiner
2011-09-13 5:48 ` Johannes Weiner
2011-09-13 10:07 ` KAMEZAWA Hiroyuki
2011-09-13 10:07 ` KAMEZAWA Hiroyuki
2011-09-19 13:23 ` Michal Hocko
2011-09-19 13:23 ` Michal Hocko
2011-09-19 13:46 ` Michal Hocko
2011-09-19 13:46 ` Michal Hocko
2011-09-20 8:52 ` Johannes Weiner
2011-09-20 8:52 ` Johannes Weiner
2011-09-12 10:57 ` [patch 03/11] mm: vmscan: distinguish between memcg triggering reclaim and memcg being scanned Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:23 ` KAMEZAWA Hiroyuki
2011-09-13 10:23 ` KAMEZAWA Hiroyuki
2011-09-19 14:29 ` Michal Hocko
2011-09-19 14:29 ` Michal Hocko
2011-09-20 8:58 ` Johannes Weiner
2011-09-20 8:58 ` Johannes Weiner
2011-09-20 9:17 ` Michal Hocko
2011-09-20 9:17 ` Michal Hocko
2011-09-29 7:55 ` Johannes Weiner
2011-09-29 7:55 ` Johannes Weiner
2011-09-12 10:57 ` [patch 04/11] mm: memcg: per-priority per-zone hierarchy scan generations Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:27 ` KAMEZAWA Hiroyuki
2011-09-13 10:27 ` KAMEZAWA Hiroyuki
2011-09-13 11:03 ` Johannes Weiner
2011-09-13 11:03 ` Johannes Weiner
2011-09-14 0:55 ` KAMEZAWA Hiroyuki
2011-09-14 0:55 ` KAMEZAWA Hiroyuki
2011-09-14 5:56 ` Johannes Weiner [this message]
2011-09-14 5:56 ` Johannes Weiner
2011-09-14 7:40 ` KAMEZAWA Hiroyuki
2011-09-14 7:40 ` KAMEZAWA Hiroyuki
2011-09-20 8:15 ` Michal Hocko
2011-09-20 8:15 ` Michal Hocko
2011-09-20 8:45 ` Michal Hocko
2011-09-20 8:45 ` Michal Hocko
2011-09-20 9:10 ` Johannes Weiner
2011-09-20 9:10 ` Johannes Weiner
2011-09-20 12:37 ` Michal Hocko
2011-09-20 12:37 ` Michal Hocko
2011-09-12 10:57 ` [patch 05/11] mm: move memcg hierarchy reclaim to generic reclaim code Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:31 ` KAMEZAWA Hiroyuki
2011-09-13 10:31 ` KAMEZAWA Hiroyuki
2011-09-20 13:09 ` Michal Hocko
2011-09-20 13:09 ` Michal Hocko
2011-09-20 13:29 ` Johannes Weiner
2011-09-20 13:29 ` Johannes Weiner
2011-09-20 14:08 ` Michal Hocko
2011-09-20 14:08 ` Michal Hocko
2011-09-12 10:57 ` [patch 06/11] mm: memcg: remove optimization of keeping the root_mem_cgroup LRU lists empty Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:34 ` KAMEZAWA Hiroyuki
2011-09-13 10:34 ` KAMEZAWA Hiroyuki
2011-09-20 15:02 ` Michal Hocko
2011-09-20 15:02 ` Michal Hocko
2011-09-29 9:20 ` Johannes Weiner
2011-09-29 9:20 ` Johannes Weiner
2011-09-29 9:49 ` Michal Hocko
2011-09-29 9:49 ` Michal Hocko
2011-09-12 10:57 ` [patch 07/11] mm: vmscan: convert unevictable page rescue scanner to per-memcg LRU lists Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:37 ` KAMEZAWA Hiroyuki
2011-09-13 10:37 ` KAMEZAWA Hiroyuki
2011-09-21 12:33 ` Michal Hocko
2011-09-21 12:33 ` Michal Hocko
2011-09-21 13:47 ` Johannes Weiner
2011-09-21 13:47 ` Johannes Weiner
2011-09-21 14:08 ` Michal Hocko
2011-09-21 14:08 ` Michal Hocko
2011-09-12 10:57 ` [patch 08/11] mm: vmscan: convert global reclaim " Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:41 ` KAMEZAWA Hiroyuki
2011-09-13 10:41 ` KAMEZAWA Hiroyuki
2011-09-21 13:10 ` Michal Hocko
2011-09-21 13:10 ` Michal Hocko
2011-09-21 13:51 ` Johannes Weiner
2011-09-21 13:51 ` Johannes Weiner
2011-09-21 13:57 ` Michal Hocko
2011-09-21 13:57 ` Michal Hocko
2011-09-12 10:57 ` [patch 09/11] mm: collect LRU list heads into struct lruvec Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:43 ` KAMEZAWA Hiroyuki
2011-09-13 10:43 ` KAMEZAWA Hiroyuki
2011-09-21 13:43 ` Michal Hocko
2011-09-21 13:43 ` Michal Hocko
2011-09-21 15:15 ` Michal Hocko
2011-09-21 15:15 ` Michal Hocko
2011-09-12 10:57 ` [patch 10/11] mm: make per-memcg LRU lists exclusive Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:47 ` KAMEZAWA Hiroyuki
2011-09-13 10:47 ` KAMEZAWA Hiroyuki
2011-09-21 15:24 ` Michal Hocko
2011-09-21 15:24 ` Michal Hocko
2011-09-21 15:47 ` Johannes Weiner
2011-09-21 15:47 ` Johannes Weiner
2011-09-21 16:05 ` Michal Hocko
2011-09-21 16:05 ` Michal Hocko
2011-09-12 10:57 ` [patch 11/11] mm: memcg: remove unused node/section info from pc->flags Johannes Weiner
2011-09-12 10:57 ` Johannes Weiner
2011-09-13 10:50 ` KAMEZAWA Hiroyuki
2011-09-13 10:50 ` KAMEZAWA Hiroyuki
2011-09-21 15:32 ` Michal Hocko
2011-09-21 15:32 ` Michal Hocko
2011-09-13 20:35 ` [patch 0/11] mm: memcg naturalization -rc3 Kirill A. Shutemov
2011-09-13 20:35 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110914055634.GA28051@redhat.com \
--to=jweiner@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=gthelen@google.com \
--cc=hch@infradead.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=minchan.kim@gmail.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=riel@redhat.com \
--cc=walken@google.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.