From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
To: Andrea Righi <righi.andrea@gmail.com>
Cc: Paul Menage <menage@google.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk,
baramsori72@gmail.com, Carl Henrik Lunde <chlunde@ping.uio.no>,
dave@linux.vnet.ibm.com, Divyesh Shah <dpshah@google.com>,
eric.rannaud@gmail.com, fernando@oss.ntt.co.jp,
Hirokazu Takahashi <taka@valinux.co.jp>,
Li Zefan <lizf@cn.fujitsu.com>,
matt@bluehost.com, dradford@bluehost.com, ngupta@google.com,
randy.dunlap@oracle.com, roberto@unbit.it,
Ryo Tsuruta <ryov@valinux.co.jp>,
Satoshi UCHIDA <s-uchida@ap.jp.nec.com>,
subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp,
Nauman Rafique <nauman@google.com>,
fchecconi@gmail.com, paolo.valente@unimore.it,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure
Date: Fri, 24 Apr 2009 10:11:09 +0800 [thread overview]
Message-ID: <49F11FBD.3070705@cn.fujitsu.com> (raw)
In-Reply-To: <1240090712-1058-4-git-send-email-righi.andrea@gmail.com>
Andrea Righi wrote:
> Dirty pages in the page cache can be processed asynchronously by kernel
> threads (pdflush) using a writeback policy. For this reason the real
> writes to the underlying block devices occur in a different IO context
> respect to the task that originally generated the dirty pages involved
> in the IO operation. This makes the tracking and throttling of writeback
> IO more complicate respect to the synchronous IO.
>
> The page_cgroup infrastructure, currently available only for the memory
> cgroup controller, can be used to store the owner of each page and
> opportunely track the writeback IO. This information is encoded in
> page_cgroup->flags.
You encode id in page_cgroup->flags, if a cgroup get removed, IMHO, you
should remove the corresponding id in flags.
One more thing, if a task is moving from a cgroup to another, the id in
flags also need to be changed.
>
> A owner can be identified using a generic ID number and the following
> interfaces are provided to store a retrieve this information:
>
> unsigned long page_cgroup_get_owner(struct page *page);
> int page_cgroup_set_owner(struct page *page, unsigned long id);
> int page_cgroup_copy_owner(struct page *npage, struct page *opage);
>
> The io-throttle controller uses the cgroup css_id() as the owner's ID
> number.
>
> A big part of this code is taken from the Ryo and Hirokazu's bio-cgroup
> controller (http://people.valinux.co.jp/~ryov/bio-cgroup/).
>
> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
> Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
> Signed-off-by: Ryo Tsuruta <ryov@valinux.co.jp>
> ---
> include/linux/memcontrol.h | 6 +++
> include/linux/mmzone.h | 4 +-
> include/linux/page_cgroup.h | 33 +++++++++++++-
> init/Kconfig | 4 ++
> mm/Makefile | 3 +-
> mm/memcontrol.c | 6 +++
> mm/page_cgroup.c | 95 ++++++++++++++++++++++++++++++++++++++-----
> 7 files changed, 135 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 18146c9..f3e0e64 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -37,6 +37,8 @@ struct mm_struct;
> * (Of course, if memcg does memory allocation in future, GFP_KERNEL is sane.)
> */
>
> +extern void __init_mem_page_cgroup(struct page_cgroup *pc);
> +
> extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
> gfp_t gfp_mask);
> /* for swap handling */
> @@ -120,6 +122,10 @@ extern bool mem_cgroup_oom_called(struct task_struct *task);
> #else /* CONFIG_CGROUP_MEM_RES_CTLR */
> struct mem_cgroup;
>
> +static inline void __init_mem_page_cgroup(struct page_cgroup *pc)
> +{
> +}
> +
> static inline int mem_cgroup_newpage_charge(struct page *page,
> struct mm_struct *mm, gfp_t gfp_mask)
> {
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 186ec6a..b178eb9 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -607,7 +607,7 @@ typedef struct pglist_data {
> int nr_zones;
> #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
> struct page *node_mem_map;
> -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +#ifdef CONFIG_PAGE_TRACKING
> struct page_cgroup *node_page_cgroup;
> #endif
> #endif
> @@ -958,7 +958,7 @@ struct mem_section {
>
> /* See declaration of similar field in struct zone */
> unsigned long *pageblock_flags;
> -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +#ifdef CONFIG_PAGE_TRACKING
> /*
> * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> * section. (see memcontrol.h/page_cgroup.h about this.)
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..f24d081 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -1,7 +1,7 @@
> #ifndef __LINUX_PAGE_CGROUP_H
> #define __LINUX_PAGE_CGROUP_H
>
> -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +#ifdef CONFIG_PAGE_TRACKING
> #include <linux/bit_spinlock.h>
> /*
> * Page Cgroup can be considered as an extended mem_map.
> @@ -12,11 +12,38 @@
> */
> struct page_cgroup {
> unsigned long flags;
> - struct mem_cgroup *mem_cgroup;
> struct page *page;
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> + struct mem_cgroup *mem_cgroup;
> struct list_head lru; /* per cgroup LRU list */
> +#endif
> };
>
> +/*
> + * use lower 16 bits for flags and reserve the rest for the page tracking id
> + */
> +#define PAGE_TRACKING_ID_SHIFT (16)
> +#define PAGE_TRACKING_ID_BITS \
> + (8 * sizeof(unsigned long) - PAGE_TRACKING_ID_SHIFT)
> +
> +/* NOTE: must be called with page_cgroup() held */
> +static inline unsigned long page_cgroup_get_id(struct page_cgroup *pc)
> +{
> + return pc->flags >> PAGE_TRACKING_ID_SHIFT;
> +}
> +
> +/* NOTE: must be called with page_cgroup() held */
> +static inline void page_cgroup_set_id(struct page_cgroup *pc, unsigned long id)
> +{
> + WARN_ON(id >= (1UL << PAGE_TRACKING_ID_BITS));
> + pc->flags &= (1UL << PAGE_TRACKING_ID_SHIFT) - 1;
> + pc->flags |= (unsigned long)(id << PAGE_TRACKING_ID_SHIFT);
> +}
> +
> +unsigned long page_cgroup_get_owner(struct page *page);
> +int page_cgroup_set_owner(struct page *page, unsigned long id);
> +int page_cgroup_copy_owner(struct page *npage, struct page *opage);
> +
> void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
> void __init page_cgroup_init(void);
> struct page_cgroup *lookup_page_cgroup(struct page *page);
> @@ -71,7 +98,7 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
> bit_spin_unlock(PCG_LOCK, &pc->flags);
> }
>
> -#else /* CONFIG_CGROUP_MEM_RES_CTLR */
> +#else /* CONFIG_PAGE_TRACKING */
> struct page_cgroup;
>
> static inline void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
> diff --git a/init/Kconfig b/init/Kconfig
> index 7be4d38..5428ac7 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -569,6 +569,7 @@ config CGROUP_MEM_RES_CTLR
> bool "Memory Resource Controller for Control Groups"
> depends on CGROUPS && RESOURCE_COUNTERS
> select MM_OWNER
> + select PAGE_TRACKING
> help
> Provides a memory resource controller that manages both anonymous
> memory and page cache. (See Documentation/cgroups/memory.txt)
> @@ -611,6 +612,9 @@ endif # CGROUPS
> config MM_OWNER
> bool
>
> +config PAGE_TRACKING
> + bool
> +
> config SYSFS_DEPRECATED
> bool
>
> diff --git a/mm/Makefile b/mm/Makefile
> index ec73c68..b94e074 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -37,4 +37,5 @@ else
> obj-$(CONFIG_SMP) += allocpercpu.o
> endif
> obj-$(CONFIG_QUICKLIST) += quicklist.o
> -obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
> +obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
> +obj-$(CONFIG_PAGE_TRACKING) += page_cgroup.o
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e44fb0f..69d1c31 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2524,6 +2524,12 @@ struct cgroup_subsys mem_cgroup_subsys = {
> .use_id = 1,
> };
>
> +void __meminit __init_mem_page_cgroup(struct page_cgroup *pc)
> +{
> + pc->mem_cgroup = NULL;
> + INIT_LIST_HEAD(&pc->lru);
> +}
> +
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>
> static int __init disable_swap_account(char *s)
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 791905c..b3b394c 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -3,6 +3,7 @@
> #include <linux/bootmem.h>
> #include <linux/bit_spinlock.h>
> #include <linux/page_cgroup.h>
> +#include <linux/blk-io-throttle.h>
> #include <linux/hash.h>
> #include <linux/slab.h>
> #include <linux/memory.h>
> @@ -14,9 +15,8 @@ static void __meminit
> __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
> {
> pc->flags = 0;
> - pc->mem_cgroup = NULL;
> pc->page = pfn_to_page(pfn);
> - INIT_LIST_HEAD(&pc->lru);
> + __init_mem_page_cgroup(pc);
> }
> static unsigned long total_usage;
>
> @@ -74,7 +74,7 @@ void __init page_cgroup_init(void)
>
> int nid, fail;
>
> - if (mem_cgroup_disabled())
> + if (mem_cgroup_disabled() && iothrottle_disabled())
> return;
>
> for_each_online_node(nid) {
> @@ -83,12 +83,13 @@ void __init page_cgroup_init(void)
> goto fail;
> }
> printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
> - printk(KERN_INFO "please try cgroup_disable=memory option if you"
> - " don't want\n");
> + printk(KERN_INFO
> + "try cgroup_disable=memory,blockio option if you don't want\n");
> return;
> fail:
> printk(KERN_CRIT "allocation of page_cgroup was failed.\n");
> - printk(KERN_CRIT "please try cgroup_disable=memory boot option\n");
> + printk(KERN_CRIT
> + "try cgroup_disable=memory,blockio boot option\n");
> panic("Out of memory");
> }
>
> @@ -243,12 +244,85 @@ static int __meminit page_cgroup_callback(struct notifier_block *self,
>
> #endif
>
> +/**
> + * page_cgroup_get_owner() - get the owner ID of a page
> + * @page: the page we want to find the owner
> + *
> + * Returns the owner ID of the page, 0 means that the owner cannot be
> + * retrieved.
> + **/
> +unsigned long page_cgroup_get_owner(struct page *page)
> +{
> + struct page_cgroup *pc;
> + unsigned long ret;
> +
> + pc = lookup_page_cgroup(page);
> + if (unlikely(!pc))
> + return 0;
> +
> + lock_page_cgroup(pc);
> + ret = page_cgroup_get_id(pc);
> + unlock_page_cgroup(pc);
> + return ret;
> +}
> +
> +/**
> + * page_cgroup_set_owner() - set the owner ID of a page
> + * @page: the page we want to tag
> + * @id: the ID number that will be associated to page
> + *
> + * Returns 0 if the owner is correctly associated to the page. Returns a
> + * negative value in case of failure.
> + **/
> +int page_cgroup_set_owner(struct page *page, unsigned long id)
> +{
> + struct page_cgroup *pc;
> +
> + pc = lookup_page_cgroup(page);
> + if (unlikely(!pc))
> + return -ENOENT;
> +
> + lock_page_cgroup(pc);
> + page_cgroup_set_id(pc, id);
> + unlock_page_cgroup(pc);
> + return 0;
> +}
> +
> +/**
> + * page_cgroup_copy_owner() - copy the owner ID of a page into another page
> + * @npage: the page where we want to copy the owner
> + * @opage: the page from which we want to copy the ID
> + *
> + * Returns 0 if the owner is correctly associated to npage. Returns a negative
> + * value in case of failure.
> + **/
> +int page_cgroup_copy_owner(struct page *npage, struct page *opage)
> +{
> + struct page_cgroup *npc, *opc;
> + unsigned long id;
> +
> + npc = lookup_page_cgroup(npage);
> + if (unlikely(!npc))
> + return -ENOENT;
> + opc = lookup_page_cgroup(opage);
> + if (unlikely(!opc))
> + return -ENOENT;
> + lock_page_cgroup(opc);
> + lock_page_cgroup(npc);
> + id = page_cgroup_get_id(opc);
> + page_cgroup_set_id(npc, id);
> + unlock_page_cgroup(npc);
> + unlock_page_cgroup(opc);
> +
> + return 0;
> +}
> +
> void __init page_cgroup_init(void)
> {
> unsigned long pfn;
> int fail = 0;
>
> - if (mem_cgroup_disabled())
> + if (mem_cgroup_disabled() && iothrottle_disabled())
> return;
>
> for (pfn = 0; !fail && pfn < max_pfn; pfn += PAGES_PER_SECTION) {
> @@ -257,14 +331,15 @@ void __init page_cgroup_init(void)
> fail = init_section_page_cgroup(pfn);
> }
> if (fail) {
> - printk(KERN_CRIT "try cgroup_disable=memory boot option\n");
> + printk(KERN_CRIT
> + "try cgroup_disable=memory,blockio boot option\n");
> panic("Out of memory");
> } else {
> hotplug_memory_notifier(page_cgroup_callback, 0);
> }
> printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
> - printk(KERN_INFO "please try cgroup_disable=memory option if you don't"
> - " want\n");
> + printk(KERN_INFO
> + "try cgroup_disable=memory,blockio option if you don't want\n");
> }
>
> void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
--
Regards
Gui Jianfeng
next prev parent reply other threads:[~2009-04-24 2:11 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-18 21:38 [PATCH 0/7] cgroup: io-throttle controller (v14) Andrea Righi
2009-04-18 21:38 ` [PATCH 1/7] io-throttle documentation Andrea Righi
2009-04-18 21:38 ` [PATCH 2/7] res_counter: introduce ratelimiting attributes Andrea Righi
[not found] ` <1240090712-1058-3-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-21 0:15 ` KAMEZAWA Hiroyuki
2009-04-21 10:13 ` Balbir Singh
2009-04-21 0:15 ` KAMEZAWA Hiroyuki
2009-04-21 9:55 ` Andrea Righi
2009-04-21 10:16 ` Balbir Singh
2009-04-21 14:17 ` Andrea Righi
[not found] ` <20090421101659.GF19637-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-04-21 14:17 ` Andrea Righi
2009-04-21 10:16 ` Balbir Singh
2009-04-21 10:19 ` KAMEZAWA Hiroyuki
2009-04-21 10:19 ` KAMEZAWA Hiroyuki
[not found] ` <20090421091534.971f676f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-04-21 9:55 ` Andrea Righi
2009-04-21 10:13 ` Balbir Singh
2009-04-21 11:16 ` Andrea Righi
[not found] ` <20090421101326.GE19637-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-04-21 11:16 ` Andrea Righi
[not found] ` <1240090712-1058-1-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-18 21:38 ` [PATCH 1/7] io-throttle documentation Andrea Righi
2009-04-18 21:38 ` [PATCH 2/7] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-18 21:38 ` [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Andrea Righi
2009-04-18 21:38 ` [PATCH 4/7] io-throttle controller infrastructure Andrea Righi
2009-04-18 21:38 ` [PATCH 5/7] kiothrottled: throttle buffered (writeback) IO Andrea Righi
2009-04-18 21:38 ` [PATCH 6/7] io-throttle instrumentation Andrea Righi
2009-04-18 21:38 ` Andrea Righi
2009-04-18 21:38 ` [PATCH 7/7] export per-task io-throttle statistics to userspace Andrea Righi
2009-04-18 21:38 ` Andrea Righi
2009-04-18 21:38 ` [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Andrea Righi
2009-04-24 2:11 ` Gui Jianfeng [this message]
2009-04-24 8:31 ` Andrea Righi
2009-04-24 9:14 ` Gui Jianfeng
[not found] ` <49F1830F.8020609-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-26 17:19 ` Andrea Righi
2009-04-26 17:19 ` Andrea Righi
2009-04-24 9:14 ` Gui Jianfeng
[not found] ` <49F11FBD.3070705-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-24 8:31 ` Andrea Righi
[not found] ` <1240090712-1058-4-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-24 2:11 ` Gui Jianfeng
2009-04-18 21:38 ` [PATCH 4/7] io-throttle controller infrastructure Andrea Righi
2009-04-20 17:59 ` Paul E. McKenney
[not found] ` <20090420175904.GD6822-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2009-04-20 21:22 ` Andrea Righi
2009-04-20 21:22 ` Andrea Righi
2009-04-21 4:15 ` Paul E. McKenney
2009-04-21 12:58 ` Andrea Righi
2009-04-21 14:03 ` Paul E. McKenney
2009-04-21 14:03 ` Paul E. McKenney
[not found] ` <20090421041524.GB6939-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2009-04-21 12:58 ` Andrea Righi
2009-04-21 4:15 ` Paul E. McKenney
[not found] ` <1240090712-1058-5-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-20 17:59 ` Paul E. McKenney
2009-04-18 21:38 ` [PATCH 5/7] kiothrottled: throttle buffered (writeback) IO Andrea Righi
[not found] ` <1240090712-1058-6-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-23 7:53 ` Gui Jianfeng
2009-04-23 7:53 ` Gui Jianfeng
2009-04-23 10:25 ` Andrea Righi
2009-04-24 6:36 ` Gui Jianfeng
2009-04-24 6:36 ` Gui Jianfeng
[not found] ` <49F01E8F.80807-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-23 10:25 ` Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2009-05-03 11:36 [PATCH 0/7] cgroup: io-throttle controller (v16) Andrea Righi
2009-05-03 11:36 ` [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Andrea Righi
[not found] ` <1241350583-9871-1-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-05-03 11:36 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49F11FBD.3070705@cn.fujitsu.com \
--to=guijianfeng@cn.fujitsu.com \
--cc=agk@sourceware.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=balbir@linux.vnet.ibm.com \
--cc=baramsori72@gmail.com \
--cc=chlunde@ping.uio.no \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=dpshah@google.com \
--cc=dradford@bluehost.com \
--cc=eric.rannaud@gmail.com \
--cc=fchecconi@gmail.com \
--cc=fernando@oss.ntt.co.jp \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=matt@bluehost.com \
--cc=menage@google.com \
--cc=nauman@google.com \
--cc=ngupta@google.com \
--cc=paolo.valente@unimore.it \
--cc=randy.dunlap@oracle.com \
--cc=righi.andrea@gmail.com \
--cc=roberto@unbit.it \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=subrata@linux.vnet.ibm.com \
--cc=taka@valinux.co.jp \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.