From: Li Zefan <lizf@cn.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
xemul@openvz.org, "hugh@veritas.com" <hugh@veritas.com>
Subject: Re: [PATCH 7/7] memcg: freeing page_cgroup at suitable chance
Date: Mon, 17 Mar 2008 12:10:40 +0900 [thread overview]
Message-ID: <47DDE130.4040509@cn.fujitsu.com> (raw)
In-Reply-To: <20080314192253.edb38762.kamezawa.hiroyu@jp.fujitsu.com>
KAMEZAWA Hiroyuki wrote:
> This patch is for freeing page_cgroup if a chunk of pages are freed.
>
> How this works
> * when the order of free page reaches PCGRP_SHRINK_ORDER, pcgrp is freed.
> This will be done by RCU.
>
> I think this works well because
> - unnecessary freeing will not occur in busy servers.
> - page_cgroup will be removed at necessary point (allocating Hugepage,etc..)
> - If tons of pages are freed (ex. big file is removed), page_cgroup will
> be removed.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsuc.com>
>
>
> include/linux/page_cgroup.h | 15 +++++++++++-
> mm/page_alloc.c | 3 ++
> mm/page_cgroup.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 71 insertions(+), 1 deletion(-)
>
> Index: mm-2.6.25-rc5-mm1/include/linux/page_cgroup.h
> ===================================================================
> --- mm-2.6.25-rc5-mm1.orig/include/linux/page_cgroup.h
> +++ mm-2.6.25-rc5-mm1/include/linux/page_cgroup.h
> @@ -39,6 +39,12 @@ DECLARE_PER_CPU(struct page_cgroup_cache
> #define PCGRP_SHIFT (CONFIG_CGROUP_PAGE_CGROUP_ORDER)
> #define PCGRP_SIZE (1 << PCGRP_SHIFT)
>
> +#if PCGRP_SHIFT + 3 >= MAX_ORDER
> +#define PCGRP_SHRINK_ORDER (MAX_ORDER - 1)
> +#else
> +#define PCGRP_SHRINK_ORDER (PCGRP_SHIFT + 3)
> +#endif
> +
> /*
> * Lookup and return page_cgroup struct.
> * returns NULL when
> @@ -70,12 +76,19 @@ get_page_cgroup(struct page *page, gfp_t
> return (ret)? ret : __get_page_cgroup(page, gfpmask, allocate);
> }
>
> +void try_to_shrink_page_cgroup(struct page *page, int order);
> +
extern void
> #else
>
> -static struct page_cgroup *
> +static inline struct page_cgroup *
> get_page_cgroup(struct page *page, gfp_t gfpmask, bool allocate)
> {
> return NULL;
> }
> +static inline void try_to_shrink_page_cgroup(struct page *page, int order)
> +{
> + return;
> +}
> +#define PCGRP_SHRINK_ORDER (MAX_ORDER)
> #endif
> #endif
> Index: mm-2.6.25-rc5-mm1/mm/page_cgroup.c
> ===================================================================
> --- mm-2.6.25-rc5-mm1.orig/mm/page_cgroup.c
> +++ mm-2.6.25-rc5-mm1/mm/page_cgroup.c
> @@ -12,6 +12,7 @@
> */
>
> #include <linux/mm.h>
> +#include <linux/mmzone.h>
> #include <linux/slab.h>
> #include <linux/radix-tree.h>
> #include <linux/memcontrol.h>
> @@ -80,6 +81,7 @@ static void save_result(struct page_cgro
> pcp = &__get_cpu_var(pcpu_page_cgroup_cache);
> pcp->ents[hash].idx = idx;
> pcp->ents[hash].base = base;
> + smp_wmb();
Whenever you add a memory barrier, you should comment on it.
> preempt_enable();
> }
>
> @@ -156,6 +158,58 @@ out:
> return pc;
> }
>
> +/* Must be called under zone->lock */
> +void try_to_shrink_page_cgroup(struct page *page, int order)
> +{
> + unsigned long pfn = page_to_pfn(page);
> + int nid = page_to_nid(page);
> + int idx = pfn >> PCGRP_SHIFT;
> + int hnum = (PAGE_CGROUP_NR_CACHE - 1);
> + struct page_cgroup_cache *pcp;
> + struct page_cgroup_head *head;
> + struct page_cgroup_root *root;
> + unsigned long end_pfn;
> + int cpu;
> +
> +
redundant empty line
> + root = root_dir[nid];
> + if (!root || in_interrupt() || (order < PCGRP_SHIFT))
> + return;
> +
> + pfn = page_to_pfn(page);
> + end_pfn = pfn + (1 << order);
> +
> + while (pfn != end_pfn) {
> + idx = pfn >> PCGRP_SHIFT;
> + /* Is this pfn has entry ? */
> + rcu_read_lock();
> + head = radix_tree_lookup(&root->root_node, idx);
> + rcu_read_unlock();
> + if (!head) {
> + pfn += (1 << PCGRP_SHIFT);
pfn += PCGRP_SIZE;
> + continue;
> + }
> + /* It's guaranteed that no one access to this pfn/idx
> + because there is no reference to this page. */
> + hnum = (idx) & (PAGE_CGROUP_NR_CACHE - 1);
> + for_each_online_cpu(cpu) {
> + pcp = &per_cpu(pcpu_page_cgroup_cache, cpu);
> + smp_rmb();
> + if (pcp->ents[hnum].idx == idx)
> + pcp->ents[hnum].base = NULL;
> + }
> + if (spin_trylock(&root->tree_lock)) {
> + /* radix tree is freed by RCU. so they will not call
> + free_pages() right now.*/
> + radix_tree_delete(&root->root_node, idx);
> + spin_unlock(&root->tree_lock);
> + /* We can free this in lazy fashion .*/
> + free_page_cgroup(head);
> + }
> + pfn += (1 << PCGRP_SHIFT);
ditto
> + }
> +}
> +
> __init int page_cgroup_init(void)
> {
> int nid;
> Index: mm-2.6.25-rc5-mm1/mm/page_alloc.c
> ===================================================================
> --- mm-2.6.25-rc5-mm1.orig/mm/page_alloc.c
> +++ mm-2.6.25-rc5-mm1/mm/page_alloc.c
> @@ -45,6 +45,7 @@
> #include <linux/fault-inject.h>
> #include <linux/page-isolation.h>
> #include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>
> #include <asm/tlbflush.h>
> #include <asm/div64.h>
> @@ -463,6 +464,8 @@ static inline void __free_one_page(struc
> order++;
> }
> set_page_order(page, order);
> + if (order >= PCGRP_SHRINK_ORDER)
> + try_to_shrink_page_cgroup(page, order);
> list_add(&page->lru,
> &zone->free_area[order].free_list[migratetype]);
> zone->free_area[order].nr_free++;
>
> --
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-03-17 3:10 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-14 9:59 [PATCH 0/7] memcg: radix-tree page_cgroup KAMEZAWA Hiroyuki
2008-03-14 10:03 ` [PATCH 1/7] re-define page_cgroup KAMEZAWA Hiroyuki
2008-03-16 14:15 ` Balbir Singh
2008-03-18 1:10 ` KAMEZAWA Hiroyuki
2008-03-17 0:21 ` Li Zefan
2008-03-18 1:12 ` KAMEZAWA Hiroyuki
2008-03-17 2:07 ` Li Zefan
2008-03-18 1:11 ` KAMEZAWA Hiroyuki
2008-03-14 10:06 ` [PATCH 2/7] charge/uncharge KAMEZAWA Hiroyuki
2008-03-17 1:46 ` Balbir Singh
2008-03-18 1:14 ` KAMEZAWA Hiroyuki
2008-03-17 2:26 ` Li Zefan
2008-03-18 1:15 ` KAMEZAWA Hiroyuki
2008-03-14 10:07 ` [PATCH 3/7] memcg: move_lists KAMEZAWA Hiroyuki
2008-03-18 16:44 ` Balbir Singh
2008-03-19 2:34 ` KAMEZAWA Hiroyuki
2008-03-14 10:15 ` [PATCH 4/7] memcg: page migration KAMEZAWA Hiroyuki
2008-03-17 2:36 ` Li Zefan
2008-03-18 1:17 ` KAMEZAWA Hiroyuki
2008-03-18 18:11 ` Balbir Singh
2008-03-19 2:44 ` KAMEZAWA Hiroyuki
2008-03-14 10:17 ` [PATCH 5/7] radix-tree page cgroup KAMEZAWA Hiroyuki
2008-03-17 2:56 ` Li Zefan
2008-03-17 3:26 ` Li Zefan
2008-03-18 1:18 ` KAMEZAWA Hiroyuki
2008-03-18 1:23 ` KAMEZAWA Hiroyuki
2008-03-19 2:05 ` Balbir Singh
2008-03-19 2:51 ` KAMEZAWA Hiroyuki
2008-03-19 3:14 ` Balbir Singh
2008-03-19 3:24 ` KAMEZAWA Hiroyuki
2008-03-19 21:11 ` Peter Zijlstra
2008-03-20 4:45 ` KAMEZAWA Hiroyuki
2008-03-20 5:09 ` KAMEZAWA Hiroyuki
2008-03-14 10:18 ` [PATCH 6/7] memcg: speed up by percpu KAMEZAWA Hiroyuki
2008-03-17 3:03 ` Li Zefan
2008-03-18 1:25 ` KAMEZAWA Hiroyuki
2008-03-18 23:55 ` Li Zefan
2008-03-19 2:51 ` KAMEZAWA Hiroyuki
2008-03-19 21:19 ` Peter Zijlstra
2008-03-19 21:41 ` Peter Zijlstra
2008-03-20 9:08 ` Andy Whitcroft
2008-03-20 4:46 ` KAMEZAWA Hiroyuki
2008-03-14 10:22 ` [PATCH 7/7] memcg: freeing page_cgroup at suitable chance KAMEZAWA Hiroyuki
2008-03-17 3:10 ` Li Zefan [this message]
2008-03-18 1:30 ` KAMEZAWA Hiroyuki
2008-03-19 21:33 ` Peter Zijlstra
2008-03-20 5:07 ` KAMEZAWA Hiroyuki
2008-03-20 7:55 ` Peter Zijlstra
2008-03-20 14:49 ` kamezawa.hiroyu
2008-03-20 16:04 ` kamezawa.hiroyu
2008-03-20 16:09 ` Peter Zijlstra
2008-03-20 16:15 ` kamezawa.hiroyu
2008-03-15 6:15 ` [PATCH 0/7] memcg: radix-tree page_cgroup Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47DDE130.4040509@cn.fujitsu.com \
--to=lizf@cn.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.