From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
xemul@openvz.org, "hugh@veritas.com" <hugh@veritas.com>
Subject: Re: [PATCH 4/7] memcg: page migration
Date: Tue, 18 Mar 2008 23:41:41 +0530 [thread overview]
Message-ID: <20080318181141.GD24473@balbir.in.ibm.com> (raw)
In-Reply-To: <20080314191543.7b0f0fa3.kamezawa.hiroyu@jp.fujitsu.com>
* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-03-14 19:15:43]:
> Christoph Lameter, writer of page migraion, suggested me that
> new page_cgroup should be assignd to new page at page is allocated.
> This patch changes migration path to assign page_cgroup of new page
> at allocation.
>
> Pros:
> - We can avoid compliated lock depndencies.
> Cons:
> - Have to handle a page which is not on LRU in memory resource controller.
>
> For pages not-on-LRU, I added PAGE_CGROUP_FLAG_MIGRATION and
> mem_cgroup->migrations counter.
> (force_empty will not end while migration because new page's
> refcnt is alive until the end of migration.)
>
> I think this version simplifies complicated lock dependency in page migraiton,
> but I admit this adds some hacky codes. If you have good idea, please advise me.
>
> Works well under my tests.
This code is easier to read as well. I think this a good approach. To
be honest, I've not had the chance to test page migration very often.
Should we update Documentation/controllers/memory.txt to indicate that
migration might prevent force_empty and hence rmdir() from working?
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>
>
>
> include/linux/memcontrol.h | 5 -
> include/linux/page_cgroup.h | 1
> mm/memcontrol.c | 153 +++++++++++++++++++++++++++-----------------
> mm/migrate.c | 22 ++++--
> 4 files changed, 113 insertions(+), 68 deletions(-)
>
> Index: mm-2.6.25-rc5-mm1/include/linux/memcontrol.h
> ===================================================================
> --- mm-2.6.25-rc5-mm1.orig/include/linux/memcontrol.h
> +++ mm-2.6.25-rc5-mm1/include/linux/memcontrol.h
> @@ -48,9 +48,8 @@ int task_in_mem_cgroup(struct task_struc
> #define mm_match_cgroup(mm, cgroup) \
> ((cgroup) == rcu_dereference((mm)->mem_cgroup))
>
> -extern int mem_cgroup_prepare_migration(struct page *page);
> -extern void mem_cgroup_end_migration(struct page *page);
> -extern void mem_cgroup_page_migration(struct page *page, struct page *newpage);
> +extern int mem_cgroup_prepare_migration(struct page *, struct page *);
> +extern void mem_cgroup_end_migration(struct page *);
>
> /*
> * For memory reclaim.
> Index: mm-2.6.25-rc5-mm1/include/linux/page_cgroup.h
> ===================================================================
> --- mm-2.6.25-rc5-mm1.orig/include/linux/page_cgroup.h
> +++ mm-2.6.25-rc5-mm1/include/linux/page_cgroup.h
> @@ -23,6 +23,7 @@ struct page_cgroup {
> /* flags */
> #define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache. */
> #define PAGE_CGROUP_FLAG_ACTIVE (0x2) /* is on active list */
> +#define PAGE_CGROUP_FLAG_MIGRATION (0x4) /* is on active list */
>
> /*
> * Lookup and return page_cgroup struct.
> Index: mm-2.6.25-rc5-mm1/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.25-rc5-mm1.orig/mm/memcontrol.c
> +++ mm-2.6.25-rc5-mm1/mm/memcontrol.c
> @@ -147,6 +147,8 @@ struct mem_cgroup {
> * statistics.
> */
> struct mem_cgroup_stat stat;
> + /* migration is under going ? */
> + atomic_t migrations;
> };
> static struct mem_cgroup init_mem_cgroup;
>
> @@ -164,6 +166,9 @@ static enum zone_type page_cgroup_zid(st
> enum charge_type {
> MEM_CGROUP_CHARGE_TYPE_CACHE = 0,
> MEM_CGROUP_CHARGE_TYPE_MAPPED,
> + /* this 2 types are not linked to LRU */
^^^^^
these
> + MEM_CGROUP_CHARGE_TYPE_MIGRATION_CACHE,
> + MEM_CGROUP_CHARGE_TYPE_MIGRATION_MAPPED,
> };
>
> /*
> @@ -480,7 +485,8 @@ unsigned long mem_cgroup_isolate_pages(u
> * < 0 if the cgroup is over its limit
> */
> static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
> - gfp_t gfp_mask, enum charge_type ctype)
> + gfp_t gfp_mask, enum charge_type ctype,
> + struct mem_cgroup *memcg)
I think it'll be a good idea to add a comment stating that memcg is
NOT NULL only for migration cases.
> {
> struct mem_cgroup *mem;
> struct page_cgroup *pc;
> @@ -514,16 +520,21 @@ static int mem_cgroup_charge_common(stru
> * thread group leader migrates. It's possible that mm is not
> * set, if so charge the init_mm (happens for pagecache usage).
> */
> - if (!mm)
> + if (!mm && !memcg)
> mm = &init_mm;
>
> - rcu_read_lock();
> - mem = rcu_dereference(mm->mem_cgroup);
> - /*
> - * For every charge from the cgroup, increment reference count
> - */
> - css_get(&mem->css);
> - rcu_read_unlock();
> + if (mm) {
> + rcu_read_lock();
> + mem = rcu_dereference(mm->mem_cgroup);
> + /*
> + * For every charge from the cgroup, increment reference count
> + */
> + css_get(&mem->css);
> + rcu_read_unlock();
> + } else {
> + mem = memcg;
> + css_get(&mem->css);
> + }
>
> while (res_counter_charge(&mem->res, PAGE_SIZE)) {
> if (!(gfp_mask & __GFP_WAIT))
> @@ -566,12 +577,24 @@ static int mem_cgroup_charge_common(stru
> pc->refcnt = 1;
> pc->mem_cgroup = mem;
> pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
> - if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE)
> + if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE ||
> + ctype == MEM_CGROUP_CHARGE_TYPE_MIGRATION_CACHE)
> pc->flags |= PAGE_CGROUP_FLAG_CACHE;
> - mz = page_cgroup_zoneinfo(pc);
> - spin_lock(&mz->lru_lock);
> - __mem_cgroup_add_list(pc);
> - spin_unlock(&mz->lru_lock);
> +
> + if (ctype == MEM_CGROUP_CHARGE_TYPE_MIGRATION_MAPPED ||
> + ctype == MEM_CGROUP_CHARGE_TYPE_MIGRATION_CACHE)
> + pc->flags |= PAGE_CGROUP_FLAG_MIGRATION;
> +
> + if (pc->flags & PAGE_CGROUP_FLAG_MIGRATION) {
> + /* just remember there is migration */
> + atomic_inc(&mem->migrations);
> + } else {
> + /* add to LRU */
> + mz = page_cgroup_zoneinfo(pc);
> + spin_lock(&mz->lru_lock);
> + __mem_cgroup_add_list(pc);
> + spin_unlock(&mz->lru_lock);
> + }
> spin_unlock_irqrestore(&pc->lock, flags);
>
> success:
> @@ -584,7 +607,7 @@ nomem:
> int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
> {
> return mem_cgroup_charge_common(page, mm, gfp_mask,
> - MEM_CGROUP_CHARGE_TYPE_MAPPED);
> + MEM_CGROUP_CHARGE_TYPE_MAPPED, NULL);
> }
>
> int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> @@ -593,7 +616,7 @@ int mem_cgroup_cache_charge(struct page
> if (!mm)
> mm = &init_mm;
> return mem_cgroup_charge_common(page, mm, gfp_mask,
> - MEM_CGROUP_CHARGE_TYPE_CACHE);
> + MEM_CGROUP_CHARGE_TYPE_CACHE, NULL);
> }
>
> /*
> @@ -637,65 +660,75 @@ void mem_cgroup_uncharge_page(struct pag
> }
>
> /*
> - * Returns non-zero if a page (under migration) has valid page_cgroup member.
> - * Refcnt of page_cgroup is incremented.
> + * Pre-charge against newpage while moving a page.
> + * This function is called before taking page locks.
> */
> -int mem_cgroup_prepare_migration(struct page *page)
> +int mem_cgroup_prepare_migration(struct page *page, struct page *newpage)
> {
> struct page_cgroup *pc;
> + struct mem_cgroup *mem = NULL;
> + int ret = 0;
> + enum charge_type type;
> + unsigned long flags;
>
> if (mem_cgroup_subsys.disabled)
> - return 0;
> + return ret;
> + /* check newpage isn't under memory resource control */
> + pc = get_page_cgroup(newpage, GFP_ATOMIC, false);
> + VM_BUG_ON(pc && pc->refcnt);
>
> - lock_page_cgroup(page);
> - pc = page_get_page_cgroup(page);
> - if (pc)
> - pc->ref_cnt++;
> - unlock_page_cgroup(page);
> - return pc != NULL;
> -}
> + pc = get_page_cgroup(page, GFP_ATOMIC, false);
> + spin_lock_irqsave(&pc->lock, flags);
> + if (pc && pc->refcnt) {
> + mem = pc->mem_cgroup;
> + if (pc->flags & PAGE_CGROUP_FLAG_CACHE)
> + type = MEM_CGROUP_CHARGE_TYPE_MIGRATION_CACHE;
> + else
> + type = MEM_CGROUP_CHARGE_TYPE_MIGRATION_MAPPED;
> + }
> + spin_unlock_irqrestore(&pc->lock, flags);
>
> -void mem_cgroup_end_migration(struct page *page)
> -{
> - mem_cgroup_uncharge_page(page);
> + if (mem) {
> + ret = mem_cgroup_charge_common(newpage, NULL, GFP_KERNEL,
> + type, mem);
> + }
> + return ret;
> }
> -
> /*
> - * We know both *page* and *newpage* are now not-on-LRU and PG_locked.
> - * And no race with uncharge() routines because page_cgroup for *page*
> - * has extra one reference by mem_cgroup_prepare_migration.
> + * At the end of migration, we'll push newpage to LRU and
> + * drop one refcnt which added at prepare_migration.
> */
> -void mem_cgroup_page_migration(struct page *page, struct page *newpage)
> +void mem_cgroup_end_migration(struct page *newpage)
> {
> struct page_cgroup *pc;
> struct mem_cgroup_per_zone *mz;
> + struct mem_cgroup *mem;
> unsigned long flags;
> + int moved = 0;
>
> - lock_page_cgroup(page);
> - pc = page_get_page_cgroup(page);
> - if (!pc) {
> - unlock_page_cgroup(page);
> + if (mem_cgroup_subsys.disabled)
> return;
> - }
> -
> - mz = page_cgroup_zoneinfo(pc);
> - spin_lock_irqsave(&mz->lru_lock, flags);
> - __mem_cgroup_remove_list(pc);
> - spin_unlock_irqrestore(&mz->lru_lock, flags);
> -
> - page_assign_page_cgroup(page, NULL);
> - unlock_page_cgroup(page);
> -
> - pc->page = newpage;
> - lock_page_cgroup(newpage);
> - page_assign_page_cgroup(newpage, pc);
>
> - mz = page_cgroup_zoneinfo(pc);
> - spin_lock_irqsave(&mz->lru_lock, flags);
> - __mem_cgroup_add_list(pc);
> - spin_unlock_irqrestore(&mz->lru_lock, flags);
> -
> - unlock_page_cgroup(newpage);
> + pc = get_page_cgroup(newpage, GFP_ATOMIC, false);
> + if (!pc)
> + return;
> + spin_lock_irqsave(&pc->lock, flags);
> + if (pc->flags & PAGE_CGROUP_FLAG_MIGRATION) {
> + pc->flags &= ~PAGE_CGROUP_FLAG_MIGRATION;
> + mem = pc->mem_cgroup;
> + mz = page_cgroup_zoneinfo(pc);
> + spin_lock(&mz->lru_lock);
> + __mem_cgroup_add_list(pc);
> + spin_unlock(&mz->lru_lock);
> + moved = 1;
> + }
> + spin_unlock_irqrestore(&pc->lock, flags);
> + if (!pc)
> + return;
> + if (moved) {
> + mem_cgroup_uncharge_page(newpage);
Not sure I understand this part correctly? Why uncharge on move? Is it
just to drop the extra reference count we have? Also should we
disallow task migration as in migrating to different cgroups when page
migration is in progress?
> + atomic_dec(&mem->migrations);
> + }
> }
>
> /*
> @@ -757,6 +790,10 @@ static int mem_cgroup_force_empty(struct
> while (mem->res.usage > 0) {
> if (atomic_read(&mem->css.cgroup->count) > 0)
> goto out;
> + if (atomic_read(&mem->migrations)) {
> + cond_resched();
> + continue;
> + }
> for_each_node_state(node, N_POSSIBLE)
> for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> struct mem_cgroup_per_zone *mz;
> Index: mm-2.6.25-rc5-mm1/mm/migrate.c
> ===================================================================
> --- mm-2.6.25-rc5-mm1.orig/mm/migrate.c
> +++ mm-2.6.25-rc5-mm1/mm/migrate.c
> @@ -358,6 +358,12 @@ static int migrate_page_move_mapping(str
>
> write_unlock_irq(&mapping->tree_lock);
>
> + /* by mem_cgroup_prepare_migration, newpage is already
> + assigned to valid cgroup. and current->mm and GFP_ATOMIC
> + will not be used...*/
> + mem_cgroup_uncharge_page(page);
> + mem_cgroup_cache_charge(newpage, current->mm ,GFP_ATOMIC);
> +
> return 0;
> }
>
> @@ -603,7 +609,6 @@ static int move_to_new_page(struct page
> rc = fallback_migrate_page(mapping, newpage, page);
>
> if (!rc) {
> - mem_cgroup_page_migration(page, newpage);
> remove_migration_ptes(page, newpage);
> } else
> newpage->mapping = NULL;
> @@ -633,6 +638,12 @@ static int unmap_and_move(new_page_t get
> /* page was freed from under us. So we are done. */
> goto move_newpage;
>
> + charge = mem_cgroup_prepare_migration(page, newpage);
> + if (charge == -ENOMEM) {
> + rc = -ENOMEM;
> + goto move_newpage;
> + }
> +
> rc = -EAGAIN;
> if (TestSetPageLocked(page)) {
> if (!force)
> @@ -684,19 +695,14 @@ static int unmap_and_move(new_page_t get
> goto rcu_unlock;
> }
>
> - charge = mem_cgroup_prepare_migration(page);
> /* Establish migration ptes or remove ptes */
> try_to_unmap(page, 1);
>
> if (!page_mapped(page))
> rc = move_to_new_page(newpage, page);
>
> - if (rc) {
> + if (rc)
> remove_migration_ptes(page, page);
> - if (charge)
> - mem_cgroup_end_migration(page);
> - } else if (charge)
> - mem_cgroup_end_migration(newpage);
> rcu_unlock:
> if (rcu_locked)
> rcu_read_unlock();
> @@ -717,6 +723,8 @@ unlock:
> }
>
> move_newpage:
> + if (!charge)
> + mem_cgroup_end_migration(newpage);
> /*
> * Move the new page to the LRU. If migration was not successful
> * then this will free the page.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-03-18 18:12 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-14 9:59 [PATCH 0/7] memcg: radix-tree page_cgroup KAMEZAWA Hiroyuki
2008-03-14 10:03 ` [PATCH 1/7] re-define page_cgroup KAMEZAWA Hiroyuki
2008-03-16 14:15 ` Balbir Singh
2008-03-18 1:10 ` KAMEZAWA Hiroyuki
2008-03-17 0:21 ` Li Zefan
2008-03-18 1:12 ` KAMEZAWA Hiroyuki
2008-03-17 2:07 ` Li Zefan
2008-03-18 1:11 ` KAMEZAWA Hiroyuki
2008-03-14 10:06 ` [PATCH 2/7] charge/uncharge KAMEZAWA Hiroyuki
2008-03-17 1:46 ` Balbir Singh
2008-03-18 1:14 ` KAMEZAWA Hiroyuki
2008-03-17 2:26 ` Li Zefan
2008-03-18 1:15 ` KAMEZAWA Hiroyuki
2008-03-14 10:07 ` [PATCH 3/7] memcg: move_lists KAMEZAWA Hiroyuki
2008-03-18 16:44 ` Balbir Singh
2008-03-19 2:34 ` KAMEZAWA Hiroyuki
2008-03-14 10:15 ` [PATCH 4/7] memcg: page migration KAMEZAWA Hiroyuki
2008-03-17 2:36 ` Li Zefan
2008-03-18 1:17 ` KAMEZAWA Hiroyuki
2008-03-18 18:11 ` Balbir Singh [this message]
2008-03-19 2:44 ` KAMEZAWA Hiroyuki
2008-03-14 10:17 ` [PATCH 5/7] radix-tree page cgroup KAMEZAWA Hiroyuki
2008-03-17 2:56 ` Li Zefan
2008-03-17 3:26 ` Li Zefan
2008-03-18 1:18 ` KAMEZAWA Hiroyuki
2008-03-18 1:23 ` KAMEZAWA Hiroyuki
2008-03-19 2:05 ` Balbir Singh
2008-03-19 2:51 ` KAMEZAWA Hiroyuki
2008-03-19 3:14 ` Balbir Singh
2008-03-19 3:24 ` KAMEZAWA Hiroyuki
2008-03-19 21:11 ` Peter Zijlstra
2008-03-20 4:45 ` KAMEZAWA Hiroyuki
2008-03-20 5:09 ` KAMEZAWA Hiroyuki
2008-03-14 10:18 ` [PATCH 6/7] memcg: speed up by percpu KAMEZAWA Hiroyuki
2008-03-17 3:03 ` Li Zefan
2008-03-18 1:25 ` KAMEZAWA Hiroyuki
2008-03-18 23:55 ` Li Zefan
2008-03-19 2:51 ` KAMEZAWA Hiroyuki
2008-03-19 21:19 ` Peter Zijlstra
2008-03-19 21:41 ` Peter Zijlstra
2008-03-20 9:08 ` Andy Whitcroft
2008-03-20 4:46 ` KAMEZAWA Hiroyuki
2008-03-14 10:22 ` [PATCH 7/7] memcg: freeing page_cgroup at suitable chance KAMEZAWA Hiroyuki
2008-03-17 3:10 ` Li Zefan
2008-03-18 1:30 ` KAMEZAWA Hiroyuki
2008-03-19 21:33 ` Peter Zijlstra
2008-03-20 5:07 ` KAMEZAWA Hiroyuki
2008-03-20 7:55 ` Peter Zijlstra
2008-03-20 14:49 ` kamezawa.hiroyu
2008-03-20 16:04 ` kamezawa.hiroyu
2008-03-20 16:09 ` Peter Zijlstra
2008-03-20 16:15 ` kamezawa.hiroyu
2008-03-15 6:15 ` [PATCH 0/7] memcg: radix-tree page_cgroup Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080318181141.GD24473@balbir.in.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).