All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ying Han <yinghan@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/10] mm/memcg: take care over pc->mem_cgroup
Date: Tue, 21 Feb 2012 09:55:08 +0400	[thread overview]
Message-ID: <4F4331BC.70205@openvz.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1202201533260.23274@eggly.anvils>

Hugh Dickins wrote:
> page_relock_lruvec() is using lookup_page_cgroup(page)->mem_cgroup
> to find the memcg, and hence its per-zone lruvec for the page.  We
> therefore need to be careful to see the right pc->mem_cgroup: where
> is it updated?
>
> In __mem_cgroup_commit_charge(), under lruvec lock whenever lru
> care might be needed, lrucare holding the page off lru at that time.
>
> In mem_cgroup_reset_owner(), not under lruvec lock, but before the
> page can be visible to others - except compaction or lumpy reclaim,
> which ignore the page because it is not yet PageLRU.
>
> In mem_cgroup_split_huge_fixup(), always under lruvec lock.
>
> In mem_cgroup_move_account(), which holds several locks, but an
> lruvec lock not among them: yet it still appears to be safe, because
> the page has been taken off its old lru and not yet put on the new.
>
> Be particularly careful in compaction's isolate_migratepages() and
> vmscan's lumpy handling in isolate_lru_pages(): those approach the
> page by its physical location, and so can encounter pages which
> would not be found by any logical lookup.  For those cases we have
> to change __isolate_lru_page() slightly: it must leave ClearPageLRU
> to the caller, because compaction and lumpy cannot safely interfere
> with a page until they have first isolated it and then locked lruvec.
>
> To the list above we have to add __mem_cgroup_uncharge_common(),
> and new function mem_cgroup_reset_uncharged_to_root(): the first
> resetting pc->mem_cgroup to root_mem_cgroup when a page off lru is
> uncharged, and the second when an uncharged page is taken off lru
> (which used to be achieved implicitly with the PageAcctLRU flag).
>
> That's because there's a remote risk that compaction or lumpy reclaim
> will spy a page while it has PageLRU set; then it's taken off LRU and
> freed, its mem_cgroup torn down and freed, the page reallocated (so
> get_page_unless_zero again succeeds); then compaction or lumpy reclaim
> reach their page_relock_lruvec, using the stale mem_cgroup for locking.
>
> So long as there's one charge on the mem_cgroup, or a page on one of
> its lrus, mem_cgroup_force_empty() cannot succeed and the mem_cgroup
> cannot be destroyed.  But when an uncharged page is taken off lru,
> or a page off lru is uncharged, it no longer protects its old memcg,
> and the one stable root_mem_cgroup must then be used for it.

This is much better than my RCU-protected locking.
That will be great if it really race-less!
I think, I could steal this and polish a little. =)

But just one question: how appears uncharged pages in mem-cg lru lists?
Maybe we can forbid this case and uncharge these pages right in
__page_cache_release() and release_pages() at final removing from LRU.
This is how my old mem-controller works. There pages in lru are always charged.

>
> Signed-off-by: Hugh Dickins<hughd@google.com>
> ---
>   include/linux/memcontrol.h |    5 ++
>   mm/compaction.c            |   36 ++++++-----------
>   mm/memcontrol.c            |   45 +++++++++++++++++++--
>   mm/swap.c                  |    2
>   mm/vmscan.c                |   73 +++++++++++++++++++++++++----------
>   5 files changed, 114 insertions(+), 47 deletions(-)
>
> --- mmotm.orig/include/linux/memcontrol.h       2012-02-18 11:57:42.675524592 -0800
> +++ mmotm/include/linux/memcontrol.h    2012-02-18 11:57:49.103524745 -0800
> @@ -65,6 +65,7 @@ extern int mem_cgroup_cache_charge(struc
>   struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
>   extern struct mem_cgroup *mem_cgroup_from_lruvec(struct lruvec *lruvec);
>   extern void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
> +extern void mem_cgroup_reset_uncharged_to_root(struct page *);
>
>   /* For coalescing uncharge for reducing memcg' overhead*/
>   extern void mem_cgroup_uncharge_start(void);
> @@ -251,6 +252,10 @@ static inline void mem_cgroup_update_lru
>   {
>   }
>
> +static inline void mem_cgroup_reset_uncharged_to_root(struct page *page)
> +{
> +}
> +
>   static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>   {
>          return NULL;
> --- mmotm.orig/mm/compaction.c  2012-02-18 11:57:42.675524592 -0800
> +++ mmotm/mm/compaction.c       2012-02-18 11:57:49.103524745 -0800
> @@ -356,28 +356,6 @@ static isolate_migrate_t isolate_migrate
>                          continue;
>                  }
>
> -               if (!lruvec) {
> -                       /*
> -                        * We do need to take the lock before advancing to
> -                        * check PageLRU etc., but there's no guarantee that
> -                        * the page we're peeking at has a stable memcg here.
> -                        */
> -                       lruvec =&zone->lruvec;
> -                       lock_lruvec(lruvec);
> -               }
> -               if (!PageLRU(page))
> -                       continue;
> -
> -               /*
> -                * PageLRU is set, and lru_lock excludes isolation,
> -                * splitting and collapsing (collapsing has already
> -                * happened if PageLRU is set).
> -                */
> -               if (PageTransHuge(page)) {
> -                       low_pfn += (1<<  compound_order(page)) - 1;
> -                       continue;
> -               }
> -
>                  if (!cc->sync)
>                          mode |= ISOLATE_ASYNC_MIGRATE;
>
> @@ -386,10 +364,24 @@ static isolate_migrate_t isolate_migrate
>                          continue;
>
>                  page_relock_lruvec(page,&lruvec);
> +               if (unlikely(!PageLRU(page) || PageUnevictable(page) ||
> +                                               PageTransHuge(page))) {
> +                       /*
> +                        * lru_lock excludes splitting a huge page,
> +                        * but we cannot hold lru_lock while freeing page.
> +                        */
> +                       low_pfn += (1<<  compound_order(page)) - 1;
> +                       unlock_lruvec(lruvec);
> +                       lruvec = NULL;
> +                       put_page(page);
> +                       continue;
> +               }
>
>                  VM_BUG_ON(PageTransCompound(page));
>
>                  /* Successfully isolated */
> +               ClearPageLRU(page);
> +               mem_cgroup_reset_uncharged_to_root(page);
>                  del_page_from_lru_list(page, lruvec, page_lru(page));
>                  list_add(&page->lru, migratelist);
>                  cc->nr_migratepages++;
> --- mmotm.orig/mm/memcontrol.c  2012-02-18 11:57:42.679524592 -0800
> +++ mmotm/mm/memcontrol.c       2012-02-18 11:57:49.107524745 -0800
> @@ -1069,6 +1069,33 @@ void page_relock_lruvec(struct page *pag
>          *lruvp = lruvec;
>   }
>
> +void mem_cgroup_reset_uncharged_to_root(struct page *page)
> +{
> +       struct page_cgroup *pc;
> +
> +       if (mem_cgroup_disabled())
> +               return;
> +
> +       VM_BUG_ON(PageLRU(page));
> +
> +       /*
> +        * Once an uncharged page is isolated from the mem_cgroup's lru,
> +        * it no longer protects that mem_cgroup from rmdir: reset to root.
> +        *
> +        * __page_cache_release() and release_pages() may be called at
> +        * interrupt time: we cannot lock_page_cgroup() then (we might
> +        * have interrupted a section with page_cgroup already locked),
> +        * nor do we need to since the page is frozen and about to be freed.
> +        */
> +       pc = lookup_page_cgroup(page);
> +       if (page_count(page))
> +               lock_page_cgroup(pc);
> +       if (!PageCgroupUsed(pc)&&  pc->mem_cgroup != root_mem_cgroup)
> +               pc->mem_cgroup = root_mem_cgroup;
> +       if (page_count(page))
> +               unlock_page_cgroup(pc);
> +}
> +
>   /**
>    * mem_cgroup_update_lru_size - account for adding or removing an lru page
>    * @lruvec: mem_cgroup per zone lru vector
> @@ -2865,6 +2892,7 @@ __mem_cgroup_uncharge_common(struct page
>          struct mem_cgroup *memcg = NULL;
>          unsigned int nr_pages = 1;
>          struct page_cgroup *pc;
> +       struct lruvec *lruvec;
>          bool anon;
>
>          if (mem_cgroup_disabled())
> @@ -2884,6 +2912,7 @@ __mem_cgroup_uncharge_common(struct page
>          if (unlikely(!PageCgroupUsed(pc)))
>                  return NULL;
>
> +       lruvec = page_lock_lruvec(page);
>          lock_page_cgroup(pc);
>
>          memcg = pc->mem_cgroup;
> @@ -2915,14 +2944,17 @@ __mem_cgroup_uncharge_common(struct page
>          mem_cgroup_charge_statistics(memcg, anon, -nr_pages);
>
>          ClearPageCgroupUsed(pc);
> +
>          /*
> -        * pc->mem_cgroup is not cleared here. It will be accessed when it's
> -        * freed from LRU. This is safe because uncharged page is expected not
> -        * to be reused (freed soon). Exception is SwapCache, it's handled by
> -        * special functions.
> +        * Once an uncharged page is isolated from the mem_cgroup's lru,
> +        * it no longer protects that mem_cgroup from rmdir: reset to root.
>           */
> +       if (!PageLRU(page)&&  pc->mem_cgroup != root_mem_cgroup)
> +               pc->mem_cgroup = root_mem_cgroup;
>
>          unlock_page_cgroup(pc);
> +       unlock_lruvec(lruvec);
> +
>          /*
>           * even after unlock, we have memcg->res.usage here and this memcg
>           * will never be freed.
> @@ -2939,6 +2971,7 @@ __mem_cgroup_uncharge_common(struct page
>
>   unlock_out:
>          unlock_page_cgroup(pc);
> +       unlock_lruvec(lruvec);
>          return NULL;
>   }
>
> @@ -3327,7 +3360,9 @@ static struct page_cgroup *lookup_page_c
>           * the first time, i.e. during boot or memory hotplug;
>           * or when mem_cgroup_disabled().
>           */
> -       if (likely(pc)&&  PageCgroupUsed(pc))
> +       if (!pc || PageCgroupUsed(pc))
> +               return pc;
> +       if (pc->mem_cgroup&&  pc->mem_cgroup != root_mem_cgroup)
>                  return pc;
>          return NULL;
>   }
> --- mmotm.orig/mm/swap.c        2012-02-18 11:57:42.679524592 -0800
> +++ mmotm/mm/swap.c     2012-02-18 11:57:49.107524745 -0800
> @@ -52,6 +52,7 @@ static void __page_cache_release(struct
>                  lruvec = page_lock_lruvec(page);
>                  VM_BUG_ON(!PageLRU(page));
>                  __ClearPageLRU(page);
> +               mem_cgroup_reset_uncharged_to_root(page);
>                  del_page_from_lru_list(page, lruvec, page_off_lru(page));
>                  unlock_lruvec(lruvec);
>          }
> @@ -583,6 +584,7 @@ void release_pages(struct page **pages,
>                          page_relock_lruvec(page,&lruvec);
>                          VM_BUG_ON(!PageLRU(page));
>                          __ClearPageLRU(page);
> +                       mem_cgroup_reset_uncharged_to_root(page);
>                          del_page_from_lru_list(page, lruvec, page_off_lru(page));
>                  }
>
> --- mmotm.orig/mm/vmscan.c      2012-02-18 11:57:42.679524592 -0800
> +++ mmotm/mm/vmscan.c   2012-02-18 11:57:49.107524745 -0800
> @@ -1087,11 +1087,11 @@ int __isolate_lru_page(struct page *page
>
>          if (likely(get_page_unless_zero(page))) {
>                  /*
> -                * Be careful not to clear PageLRU until after we're
> -                * sure the page is not being freed elsewhere -- the
> -                * page release code relies on it.
> +                * Beware of interface change: now leave ClearPageLRU(page)
> +                * to the caller, because memcg's lumpy and compaction
> +                * cases (approaching the page by its physical location)
> +                * may not have the right lru_lock yet.
>                   */
> -               ClearPageLRU(page);
>                  ret = 0;
>          }
>
> @@ -1154,7 +1154,16 @@ static unsigned long isolate_lru_pages(u
>
>                  switch (__isolate_lru_page(page, mode, file)) {
>                  case 0:
> +#ifdef CONFIG_DEBUG_VM
> +                       /* check lock on page is lock we already got */
> +                       page_relock_lruvec(page,&lruvec);
> +                       BUG_ON(lruvec != home_lruvec);
> +                       BUG_ON(page != lru_to_page(src));
> +                       BUG_ON(page_lru(page) != lru);
> +#endif
> +                       ClearPageLRU(page);
>                          isolated_pages = hpage_nr_pages(page);
> +                       mem_cgroup_reset_uncharged_to_root(page);
>                          mem_cgroup_update_lru_size(lruvec, lru, -isolated_pages);
>                          list_move(&page->lru, dst);
>                          nr_taken += isolated_pages;
> @@ -1211,21 +1220,7 @@ static unsigned long isolate_lru_pages(u
>                              !PageSwapCache(cursor_page))
>                                  break;
>
> -                       if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> -                               mem_cgroup_page_relock_lruvec(cursor_page,
> -&lruvec);
> -                               isolated_pages = hpage_nr_pages(cursor_page);
> -                               mem_cgroup_update_lru_size(lruvec,
> -                                       page_lru(cursor_page), -isolated_pages);
> -                               list_move(&cursor_page->lru, dst);
> -
> -                               nr_taken += isolated_pages;
> -                               nr_lumpy_taken += isolated_pages;
> -                               if (PageDirty(cursor_page))
> -                                       nr_lumpy_dirty += isolated_pages;
> -                               scan++;
> -                               pfn += isolated_pages - 1;
> -                       } else {
> +                       if (__isolate_lru_page(cursor_page, mode, file) != 0) {
>                                  /*
>                                   * Check if the page is freed already.
>                                   *
> @@ -1243,13 +1238,50 @@ static unsigned long isolate_lru_pages(u
>                                          continue;
>                                  break;
>                          }
> +
> +                       /*
> +                        * This locking call is a no-op in the non-memcg
> +                        * case, since we already hold the right lru_lock;
> +                        * but it may change the lock in the memcg case.
> +                        * It is then vital to recheck PageLRU (but not
> +                        * necessary to recheck isolation mode).
> +                        */
> +                       mem_cgroup_page_relock_lruvec(cursor_page,&lruvec);
> +
> +                       if (PageLRU(cursor_page)&&
> +                           !PageUnevictable(cursor_page)) {
> +                               ClearPageLRU(cursor_page);
> +                               isolated_pages = hpage_nr_pages(cursor_page);
> +                               mem_cgroup_reset_uncharged_to_root(cursor_page);
> +                               mem_cgroup_update_lru_size(lruvec,
> +                                       page_lru(cursor_page), -isolated_pages);
> +                               list_move(&cursor_page->lru, dst);
> +
> +                               nr_taken += isolated_pages;
> +                               nr_lumpy_taken += isolated_pages;
> +                               if (PageDirty(cursor_page))
> +                                       nr_lumpy_dirty += isolated_pages;
> +                               scan++;
> +                               pfn += isolated_pages - 1;
> +                       } else {
> +                               /* Cannot hold lru_lock while freeing page */
> +                               unlock_lruvec(lruvec);
> +                               lruvec = NULL;
> +                               put_page(cursor_page);
> +                               break;
> +                       }
>                  }
>
>                  /* If we break out of the loop above, lumpy reclaim failed */
>                  if (pfn<  end_pfn)
>                          nr_lumpy_failed++;
>
> -               lruvec = home_lruvec;
> +               if (lruvec != home_lruvec) {
> +                       if (lruvec)
> +                               unlock_lruvec(lruvec);
> +                       lruvec = home_lruvec;
> +                       lock_lruvec(lruvec);
> +               }
>          }
>
>          *nr_scanned = scan;
> @@ -1301,6 +1333,7 @@ int isolate_lru_page(struct page *page)
>                          int lru = page_lru(page);
>                          get_page(page);
>                          ClearPageLRU(page);
> +                       mem_cgroup_reset_uncharged_to_root(page);
>                          del_page_from_lru_list(page, lruvec, lru);
>                          ret = 0;
>                  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ying Han <yinghan@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/10] mm/memcg: take care over pc->mem_cgroup
Date: Tue, 21 Feb 2012 09:55:08 +0400	[thread overview]
Message-ID: <4F4331BC.70205@openvz.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1202201533260.23274@eggly.anvils>

Hugh Dickins wrote:
> page_relock_lruvec() is using lookup_page_cgroup(page)->mem_cgroup
> to find the memcg, and hence its per-zone lruvec for the page.  We
> therefore need to be careful to see the right pc->mem_cgroup: where
> is it updated?
>
> In __mem_cgroup_commit_charge(), under lruvec lock whenever lru
> care might be needed, lrucare holding the page off lru at that time.
>
> In mem_cgroup_reset_owner(), not under lruvec lock, but before the
> page can be visible to others - except compaction or lumpy reclaim,
> which ignore the page because it is not yet PageLRU.
>
> In mem_cgroup_split_huge_fixup(), always under lruvec lock.
>
> In mem_cgroup_move_account(), which holds several locks, but an
> lruvec lock not among them: yet it still appears to be safe, because
> the page has been taken off its old lru and not yet put on the new.
>
> Be particularly careful in compaction's isolate_migratepages() and
> vmscan's lumpy handling in isolate_lru_pages(): those approach the
> page by its physical location, and so can encounter pages which
> would not be found by any logical lookup.  For those cases we have
> to change __isolate_lru_page() slightly: it must leave ClearPageLRU
> to the caller, because compaction and lumpy cannot safely interfere
> with a page until they have first isolated it and then locked lruvec.
>
> To the list above we have to add __mem_cgroup_uncharge_common(),
> and new function mem_cgroup_reset_uncharged_to_root(): the first
> resetting pc->mem_cgroup to root_mem_cgroup when a page off lru is
> uncharged, and the second when an uncharged page is taken off lru
> (which used to be achieved implicitly with the PageAcctLRU flag).
>
> That's because there's a remote risk that compaction or lumpy reclaim
> will spy a page while it has PageLRU set; then it's taken off LRU and
> freed, its mem_cgroup torn down and freed, the page reallocated (so
> get_page_unless_zero again succeeds); then compaction or lumpy reclaim
> reach their page_relock_lruvec, using the stale mem_cgroup for locking.
>
> So long as there's one charge on the mem_cgroup, or a page on one of
> its lrus, mem_cgroup_force_empty() cannot succeed and the mem_cgroup
> cannot be destroyed.  But when an uncharged page is taken off lru,
> or a page off lru is uncharged, it no longer protects its old memcg,
> and the one stable root_mem_cgroup must then be used for it.

This is much better than my RCU-protected locking.
That will be great if it really race-less!
I think, I could steal this and polish a little. =)

But just one question: how appears uncharged pages in mem-cg lru lists?
Maybe we can forbid this case and uncharge these pages right in
__page_cache_release() and release_pages() at final removing from LRU.
This is how my old mem-controller works. There pages in lru are always charged.

>
> Signed-off-by: Hugh Dickins<hughd@google.com>
> ---
>   include/linux/memcontrol.h |    5 ++
>   mm/compaction.c            |   36 ++++++-----------
>   mm/memcontrol.c            |   45 +++++++++++++++++++--
>   mm/swap.c                  |    2
>   mm/vmscan.c                |   73 +++++++++++++++++++++++++----------
>   5 files changed, 114 insertions(+), 47 deletions(-)
>
> --- mmotm.orig/include/linux/memcontrol.h       2012-02-18 11:57:42.675524592 -0800
> +++ mmotm/include/linux/memcontrol.h    2012-02-18 11:57:49.103524745 -0800
> @@ -65,6 +65,7 @@ extern int mem_cgroup_cache_charge(struc
>   struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
>   extern struct mem_cgroup *mem_cgroup_from_lruvec(struct lruvec *lruvec);
>   extern void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
> +extern void mem_cgroup_reset_uncharged_to_root(struct page *);
>
>   /* For coalescing uncharge for reducing memcg' overhead*/
>   extern void mem_cgroup_uncharge_start(void);
> @@ -251,6 +252,10 @@ static inline void mem_cgroup_update_lru
>   {
>   }
>
> +static inline void mem_cgroup_reset_uncharged_to_root(struct page *page)
> +{
> +}
> +
>   static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>   {
>          return NULL;
> --- mmotm.orig/mm/compaction.c  2012-02-18 11:57:42.675524592 -0800
> +++ mmotm/mm/compaction.c       2012-02-18 11:57:49.103524745 -0800
> @@ -356,28 +356,6 @@ static isolate_migrate_t isolate_migrate
>                          continue;
>                  }
>
> -               if (!lruvec) {
> -                       /*
> -                        * We do need to take the lock before advancing to
> -                        * check PageLRU etc., but there's no guarantee that
> -                        * the page we're peeking at has a stable memcg here.
> -                        */
> -                       lruvec =&zone->lruvec;
> -                       lock_lruvec(lruvec);
> -               }
> -               if (!PageLRU(page))
> -                       continue;
> -
> -               /*
> -                * PageLRU is set, and lru_lock excludes isolation,
> -                * splitting and collapsing (collapsing has already
> -                * happened if PageLRU is set).
> -                */
> -               if (PageTransHuge(page)) {
> -                       low_pfn += (1<<  compound_order(page)) - 1;
> -                       continue;
> -               }
> -
>                  if (!cc->sync)
>                          mode |= ISOLATE_ASYNC_MIGRATE;
>
> @@ -386,10 +364,24 @@ static isolate_migrate_t isolate_migrate
>                          continue;
>
>                  page_relock_lruvec(page,&lruvec);
> +               if (unlikely(!PageLRU(page) || PageUnevictable(page) ||
> +                                               PageTransHuge(page))) {
> +                       /*
> +                        * lru_lock excludes splitting a huge page,
> +                        * but we cannot hold lru_lock while freeing page.
> +                        */
> +                       low_pfn += (1<<  compound_order(page)) - 1;
> +                       unlock_lruvec(lruvec);
> +                       lruvec = NULL;
> +                       put_page(page);
> +                       continue;
> +               }
>
>                  VM_BUG_ON(PageTransCompound(page));
>
>                  /* Successfully isolated */
> +               ClearPageLRU(page);
> +               mem_cgroup_reset_uncharged_to_root(page);
>                  del_page_from_lru_list(page, lruvec, page_lru(page));
>                  list_add(&page->lru, migratelist);
>                  cc->nr_migratepages++;
> --- mmotm.orig/mm/memcontrol.c  2012-02-18 11:57:42.679524592 -0800
> +++ mmotm/mm/memcontrol.c       2012-02-18 11:57:49.107524745 -0800
> @@ -1069,6 +1069,33 @@ void page_relock_lruvec(struct page *pag
>          *lruvp = lruvec;
>   }
>
> +void mem_cgroup_reset_uncharged_to_root(struct page *page)
> +{
> +       struct page_cgroup *pc;
> +
> +       if (mem_cgroup_disabled())
> +               return;
> +
> +       VM_BUG_ON(PageLRU(page));
> +
> +       /*
> +        * Once an uncharged page is isolated from the mem_cgroup's lru,
> +        * it no longer protects that mem_cgroup from rmdir: reset to root.
> +        *
> +        * __page_cache_release() and release_pages() may be called at
> +        * interrupt time: we cannot lock_page_cgroup() then (we might
> +        * have interrupted a section with page_cgroup already locked),
> +        * nor do we need to since the page is frozen and about to be freed.
> +        */
> +       pc = lookup_page_cgroup(page);
> +       if (page_count(page))
> +               lock_page_cgroup(pc);
> +       if (!PageCgroupUsed(pc)&&  pc->mem_cgroup != root_mem_cgroup)
> +               pc->mem_cgroup = root_mem_cgroup;
> +       if (page_count(page))
> +               unlock_page_cgroup(pc);
> +}
> +
>   /**
>    * mem_cgroup_update_lru_size - account for adding or removing an lru page
>    * @lruvec: mem_cgroup per zone lru vector
> @@ -2865,6 +2892,7 @@ __mem_cgroup_uncharge_common(struct page
>          struct mem_cgroup *memcg = NULL;
>          unsigned int nr_pages = 1;
>          struct page_cgroup *pc;
> +       struct lruvec *lruvec;
>          bool anon;
>
>          if (mem_cgroup_disabled())
> @@ -2884,6 +2912,7 @@ __mem_cgroup_uncharge_common(struct page
>          if (unlikely(!PageCgroupUsed(pc)))
>                  return NULL;
>
> +       lruvec = page_lock_lruvec(page);
>          lock_page_cgroup(pc);
>
>          memcg = pc->mem_cgroup;
> @@ -2915,14 +2944,17 @@ __mem_cgroup_uncharge_common(struct page
>          mem_cgroup_charge_statistics(memcg, anon, -nr_pages);
>
>          ClearPageCgroupUsed(pc);
> +
>          /*
> -        * pc->mem_cgroup is not cleared here. It will be accessed when it's
> -        * freed from LRU. This is safe because uncharged page is expected not
> -        * to be reused (freed soon). Exception is SwapCache, it's handled by
> -        * special functions.
> +        * Once an uncharged page is isolated from the mem_cgroup's lru,
> +        * it no longer protects that mem_cgroup from rmdir: reset to root.
>           */
> +       if (!PageLRU(page)&&  pc->mem_cgroup != root_mem_cgroup)
> +               pc->mem_cgroup = root_mem_cgroup;
>
>          unlock_page_cgroup(pc);
> +       unlock_lruvec(lruvec);
> +
>          /*
>           * even after unlock, we have memcg->res.usage here and this memcg
>           * will never be freed.
> @@ -2939,6 +2971,7 @@ __mem_cgroup_uncharge_common(struct page
>
>   unlock_out:
>          unlock_page_cgroup(pc);
> +       unlock_lruvec(lruvec);
>          return NULL;
>   }
>
> @@ -3327,7 +3360,9 @@ static struct page_cgroup *lookup_page_c
>           * the first time, i.e. during boot or memory hotplug;
>           * or when mem_cgroup_disabled().
>           */
> -       if (likely(pc)&&  PageCgroupUsed(pc))
> +       if (!pc || PageCgroupUsed(pc))
> +               return pc;
> +       if (pc->mem_cgroup&&  pc->mem_cgroup != root_mem_cgroup)
>                  return pc;
>          return NULL;
>   }
> --- mmotm.orig/mm/swap.c        2012-02-18 11:57:42.679524592 -0800
> +++ mmotm/mm/swap.c     2012-02-18 11:57:49.107524745 -0800
> @@ -52,6 +52,7 @@ static void __page_cache_release(struct
>                  lruvec = page_lock_lruvec(page);
>                  VM_BUG_ON(!PageLRU(page));
>                  __ClearPageLRU(page);
> +               mem_cgroup_reset_uncharged_to_root(page);
>                  del_page_from_lru_list(page, lruvec, page_off_lru(page));
>                  unlock_lruvec(lruvec);
>          }
> @@ -583,6 +584,7 @@ void release_pages(struct page **pages,
>                          page_relock_lruvec(page,&lruvec);
>                          VM_BUG_ON(!PageLRU(page));
>                          __ClearPageLRU(page);
> +                       mem_cgroup_reset_uncharged_to_root(page);
>                          del_page_from_lru_list(page, lruvec, page_off_lru(page));
>                  }
>
> --- mmotm.orig/mm/vmscan.c      2012-02-18 11:57:42.679524592 -0800
> +++ mmotm/mm/vmscan.c   2012-02-18 11:57:49.107524745 -0800
> @@ -1087,11 +1087,11 @@ int __isolate_lru_page(struct page *page
>
>          if (likely(get_page_unless_zero(page))) {
>                  /*
> -                * Be careful not to clear PageLRU until after we're
> -                * sure the page is not being freed elsewhere -- the
> -                * page release code relies on it.
> +                * Beware of interface change: now leave ClearPageLRU(page)
> +                * to the caller, because memcg's lumpy and compaction
> +                * cases (approaching the page by its physical location)
> +                * may not have the right lru_lock yet.
>                   */
> -               ClearPageLRU(page);
>                  ret = 0;
>          }
>
> @@ -1154,7 +1154,16 @@ static unsigned long isolate_lru_pages(u
>
>                  switch (__isolate_lru_page(page, mode, file)) {
>                  case 0:
> +#ifdef CONFIG_DEBUG_VM
> +                       /* check lock on page is lock we already got */
> +                       page_relock_lruvec(page,&lruvec);
> +                       BUG_ON(lruvec != home_lruvec);
> +                       BUG_ON(page != lru_to_page(src));
> +                       BUG_ON(page_lru(page) != lru);
> +#endif
> +                       ClearPageLRU(page);
>                          isolated_pages = hpage_nr_pages(page);
> +                       mem_cgroup_reset_uncharged_to_root(page);
>                          mem_cgroup_update_lru_size(lruvec, lru, -isolated_pages);
>                          list_move(&page->lru, dst);
>                          nr_taken += isolated_pages;
> @@ -1211,21 +1220,7 @@ static unsigned long isolate_lru_pages(u
>                              !PageSwapCache(cursor_page))
>                                  break;
>
> -                       if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> -                               mem_cgroup_page_relock_lruvec(cursor_page,
> -&lruvec);
> -                               isolated_pages = hpage_nr_pages(cursor_page);
> -                               mem_cgroup_update_lru_size(lruvec,
> -                                       page_lru(cursor_page), -isolated_pages);
> -                               list_move(&cursor_page->lru, dst);
> -
> -                               nr_taken += isolated_pages;
> -                               nr_lumpy_taken += isolated_pages;
> -                               if (PageDirty(cursor_page))
> -                                       nr_lumpy_dirty += isolated_pages;
> -                               scan++;
> -                               pfn += isolated_pages - 1;
> -                       } else {
> +                       if (__isolate_lru_page(cursor_page, mode, file) != 0) {
>                                  /*
>                                   * Check if the page is freed already.
>                                   *
> @@ -1243,13 +1238,50 @@ static unsigned long isolate_lru_pages(u
>                                          continue;
>                                  break;
>                          }
> +
> +                       /*
> +                        * This locking call is a no-op in the non-memcg
> +                        * case, since we already hold the right lru_lock;
> +                        * but it may change the lock in the memcg case.
> +                        * It is then vital to recheck PageLRU (but not
> +                        * necessary to recheck isolation mode).
> +                        */
> +                       mem_cgroup_page_relock_lruvec(cursor_page,&lruvec);
> +
> +                       if (PageLRU(cursor_page)&&
> +                           !PageUnevictable(cursor_page)) {
> +                               ClearPageLRU(cursor_page);
> +                               isolated_pages = hpage_nr_pages(cursor_page);
> +                               mem_cgroup_reset_uncharged_to_root(cursor_page);
> +                               mem_cgroup_update_lru_size(lruvec,
> +                                       page_lru(cursor_page), -isolated_pages);
> +                               list_move(&cursor_page->lru, dst);
> +
> +                               nr_taken += isolated_pages;
> +                               nr_lumpy_taken += isolated_pages;
> +                               if (PageDirty(cursor_page))
> +                                       nr_lumpy_dirty += isolated_pages;
> +                               scan++;
> +                               pfn += isolated_pages - 1;
> +                       } else {
> +                               /* Cannot hold lru_lock while freeing page */
> +                               unlock_lruvec(lruvec);
> +                               lruvec = NULL;
> +                               put_page(cursor_page);
> +                               break;
> +                       }
>                  }
>
>                  /* If we break out of the loop above, lumpy reclaim failed */
>                  if (pfn<  end_pfn)
>                          nr_lumpy_failed++;
>
> -               lruvec = home_lruvec;
> +               if (lruvec != home_lruvec) {
> +                       if (lruvec)
> +                               unlock_lruvec(lruvec);
> +                       lruvec = home_lruvec;
> +                       lock_lruvec(lruvec);
> +               }
>          }
>
>          *nr_scanned = scan;
> @@ -1301,6 +1333,7 @@ int isolate_lru_page(struct page *page)
>                          int lru = page_lru(page);
>                          get_page(page);
>                          ClearPageLRU(page);
> +                       mem_cgroup_reset_uncharged_to_root(page);
>                          del_page_from_lru_list(page, lruvec, lru);
>                          ret = 0;
>                  }


  reply	other threads:[~2012-02-21  5:55 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-20 23:26 [PATCH 0/10] mm/memcg: per-memcg per-zone lru locking Hugh Dickins
2012-02-20 23:26 ` Hugh Dickins
2012-02-20 23:28 ` [PATCH 1/10] mm/memcg: scanning_global_lru means mem_cgroup_disabled Hugh Dickins
2012-02-20 23:28   ` Hugh Dickins
2012-02-21  8:03   ` KAMEZAWA Hiroyuki
2012-02-21  8:03     ` KAMEZAWA Hiroyuki
2012-02-20 23:29 ` [PATCH 2/10] mm/memcg: move reclaim_stat into lruvec Hugh Dickins
2012-02-20 23:29   ` Hugh Dickins
2012-02-21  8:05   ` KAMEZAWA Hiroyuki
2012-02-21  8:05     ` KAMEZAWA Hiroyuki
2012-02-20 23:30 ` [PATCH 3/10] mm/memcg: add zone pointer " Hugh Dickins
2012-02-20 23:30   ` Hugh Dickins
2012-02-21  8:08   ` KAMEZAWA Hiroyuki
2012-02-21  8:08     ` KAMEZAWA Hiroyuki
2012-02-20 23:32 ` [PATCH 4/10] mm/memcg: apply add/del_page to lruvec Hugh Dickins
2012-02-20 23:32   ` Hugh Dickins
2012-02-21  8:20   ` KAMEZAWA Hiroyuki
2012-02-21  8:20     ` KAMEZAWA Hiroyuki
2012-02-21 22:25     ` Hugh Dickins
2012-02-21 22:25       ` Hugh Dickins
2012-02-20 23:33 ` [PATCH 5/10] mm/memcg: introduce page_relock_lruvec Hugh Dickins
2012-02-20 23:33   ` Hugh Dickins
2012-02-21  8:38   ` KAMEZAWA Hiroyuki
2012-02-21  8:38     ` KAMEZAWA Hiroyuki
2012-02-21 22:36     ` Hugh Dickins
2012-02-21 22:36       ` Hugh Dickins
2012-02-20 23:34 ` [PATCH 6/10] mm/memcg: take care over pc->mem_cgroup Hugh Dickins
2012-02-20 23:34   ` Hugh Dickins
2012-02-21  5:55   ` Konstantin Khlebnikov [this message]
2012-02-21  5:55     ` Konstantin Khlebnikov
2012-02-21 19:37     ` Hugh Dickins
2012-02-21 19:37       ` Hugh Dickins
2012-02-21 20:40       ` Konstantin Khlebnikov
2012-02-21 20:40         ` Konstantin Khlebnikov
2012-02-21 22:05         ` Hugh Dickins
2012-02-21 22:05           ` Hugh Dickins
2012-02-21  6:05   ` Konstantin Khlebnikov
2012-02-21  6:05     ` Konstantin Khlebnikov
2012-02-21 20:00     ` Hugh Dickins
2012-02-21 20:00       ` Hugh Dickins
2012-02-21  9:13   ` KAMEZAWA Hiroyuki
2012-02-21  9:13     ` KAMEZAWA Hiroyuki
2012-02-21 23:03     ` Hugh Dickins
2012-02-21 23:03       ` Hugh Dickins
2012-02-22  4:05       ` Konstantin Khlebnikov
2012-02-22  4:05         ` Konstantin Khlebnikov
2012-02-20 23:35 ` [PATCH 7/10] mm/memcg: remove mem_cgroup_reset_owner Hugh Dickins
2012-02-20 23:35   ` Hugh Dickins
2012-02-21  9:17   ` KAMEZAWA Hiroyuki
2012-02-21  9:17     ` KAMEZAWA Hiroyuki
2012-02-20 23:36 ` [PATCH 8/10] mm/memcg: nest lru_lock inside page_cgroup lock Hugh Dickins
2012-02-20 23:36   ` Hugh Dickins
2012-02-21  9:48   ` KAMEZAWA Hiroyuki
2012-02-21  9:48     ` KAMEZAWA Hiroyuki
2012-02-20 23:38 ` [PATCH 9/10] mm/memcg: move lru_lock into lruvec Hugh Dickins
2012-02-20 23:38   ` Hugh Dickins
2012-02-21  7:08   ` Konstantin Khlebnikov
2012-02-21  7:08     ` Konstantin Khlebnikov
2012-02-21 20:12     ` Hugh Dickins
2012-02-21 20:12       ` Hugh Dickins
2012-02-21 21:35       ` Konstantin Khlebnikov
2012-02-21 21:35         ` Konstantin Khlebnikov
2012-02-21 22:12         ` Hugh Dickins
2012-02-21 22:12           ` Hugh Dickins
2012-02-22  3:43           ` Konstantin Khlebnikov
2012-02-22  3:43             ` Konstantin Khlebnikov
2012-02-22  6:09             ` Hugh Dickins
2012-02-22  6:09               ` Hugh Dickins
2012-02-23 14:21               ` Konstantin Khlebnikov
2012-02-23 14:21                 ` Konstantin Khlebnikov
2012-02-20 23:39 ` [PATCH 10/10] mm/memcg: per-memcg per-zone lru locking Hugh Dickins
2012-02-20 23:39   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4331BC.70205@openvz.org \
    --to=khlebnikov@openvz.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.