* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) @ 2009-05-15 17:45 KAMEZAWA Hiroyuki 2009-05-15 18:16 ` Balbir Singh 2009-05-17 4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh 0 siblings, 2 replies; 30+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-05-15 17:45 UTC (permalink / raw) To: balbir Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, KAMEZAWA Hiroyuki, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro Balbir Singh wrote: > Feature: Remove the overhead associated with the root cgroup > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > This patch changes the memory cgroup and removes the overhead associated > with LRU maintenance of all pages in the root cgroup. As a side-effect, we > can > no longer set a memory hard limit in the root cgroup. > > A new flag is used to track page_cgroup associated with the root cgroup > pages. A new flag to track whether the page has been accounted or not > has been added as well. > > Review comments higly appreciated > > Tests > > 1. Tested with allocate, touch and limit test case for a non-root cgroup > 2. For the root cgroup tested performance impact with reaim > > > +patch mmtom-08-may-2009 > AIM9 1362.93 1338.17 > Dbase 17457.75 16021.58 > New Dbase 18070.18 16518.54 > Shared 9681.85 8882.11 > Compute 16197.79 15226.13 > Hmm, at first impression, I can't convice the numbers... Just avoiding list_add/del makes programs _10%_ faster ? Could you show changes in cpu cache-miss late if you can ? (And why Aim9 goes bad ?) Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter is not a problem here.. Could you show your .config and environment ? When I trunst above numbers, it seems there is more optimization/ prefetch point in usual path BTW, how the perfomance changes in children(not default) groups ? > 3. Tested accounting in root cgroup to make sure it looks sane and > correct. > Not sure but swap and shmem case should be checked carefully.. > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > > include/linux/page_cgroup.h | 10 ++++++++++ > mm/memcontrol.c | 29 ++++++++++++++++++++++++++--- > mm/page_cgroup.c | 1 - > 3 files changed, 36 insertions(+), 4 deletions(-) > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > index 7339c7b..8b85752 100644 > --- a/include/linux/page_cgroup.h > +++ b/include/linux/page_cgroup.h > @@ -26,6 +26,8 @@ enum { > PCG_LOCK, /* page cgroup is locked */ > PCG_CACHE, /* charged as cache */ > PCG_USED, /* this object is in use. */ > + PCG_ROOT, /* page belongs to root cgroup */ > + PCG_ACCT, /* page has been accounted for */ Reading codes, this PCG_ACCT should be PCG_AcctLRU. > }; > > #define TESTPCGFLAG(uname, lname) \ > @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE) > TESTPCGFLAG(Used, USED) > CLEARPCGFLAG(Used, USED) > > +SETPCGFLAG(Root, ROOT) > +CLEARPCGFLAG(Root, ROOT) > +TESTPCGFLAG(Root, ROOT) > + > +SETPCGFLAG(Acct, ACCT) > +CLEARPCGFLAG(Acct, ACCT) > +TESTPCGFLAG(Acct, ACCT) > + > static inline int page_cgroup_nid(struct page_cgroup *pc) > { > return page_to_nid(pc->page); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 9712ef7..18d2819 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -43,6 +43,7 @@ > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > #define MEM_CGROUP_RECLAIM_RETRIES 5 > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > /* Turned on only when memory cgroup is enabled && really_do_swap_account > = 0 */ > @@ -196,6 +197,10 @@ enum charge_type { > #define PCGF_CACHE (1UL << PCG_CACHE) > #define PCGF_USED (1UL << PCG_USED) > #define PCGF_LOCK (1UL << PCG_LOCK) > +/* Not used, but added here for completeness */ > +#define PCGF_ROOT (1UL << PCG_ROOT) > +#define PCGF_ACCT (1UL << PCG_ACCT) > + > static const unsigned long > pcg_default_flags[NR_CHARGE_TYPE] = { > PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ > @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum > lru_list lru) > return; > pc = lookup_page_cgroup(page); > /* can happen while we handle swapcache. */ > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > + if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) > return; > /* > * We don't check PCG_USED bit. It's cleared when the "page" is finally > @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum > lru_list lru) > mz = page_cgroup_zoneinfo(pc); > mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > + ClearPageCgroupAcct(pc); > + if (PageCgroupRoot(pc)) > + return; > list_del_init(&pc->lru); > return; > } > @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, > enum lru_list lru) > * For making pc->mem_cgroup visible, insert smp_rmb() here. > */ > smp_rmb(); > - /* unused page is not rotated. */ > - if (!PageCgroupUsed(pc)) > + /* unused or root page is not rotated. */ > + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) > return; > mz = page_cgroup_zoneinfo(pc); > list_move(&pc->lru, &mz->lists[lru]); > @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum > lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > MEM_CGROUP_ZSTAT(mz, lru) += 1; > + SetPageCgroupAcct(pc); > + if (PageCgroupRoot(pc)) > + return; > list_add(&pc->lru, &mz->lists[lru]); > } I think set/clear flag here adds race condtion....because pc->flags is modfied by pc->flags = pcg_dafault_flags[ctype] in commit_charge() you have to modify above lines to be SetPageCgroupCache(pc) or some.. ... SetPageCgroupUsed(pc) Then, you can use set_bit() without lock_page_cgroup(). (Currently, pc->flags is modified only under lock_page_cgroup(), so, non atomic code is used.) Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-15 17:45 [RFC] Low overhead patches for the memory cgroup controller (v2) KAMEZAWA Hiroyuki @ 2009-05-15 18:16 ` Balbir Singh 2009-05-18 10:11 ` KAMEZAWA Hiroyuki 2009-05-17 4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh 1 sibling, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-05-15 18:16 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > Balbir Singh wrote: > > Feature: Remove the overhead associated with the root cgroup > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > This patch changes the memory cgroup and removes the overhead associated > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we > > can > > no longer set a memory hard limit in the root cgroup. > > > > A new flag is used to track page_cgroup associated with the root cgroup > > pages. A new flag to track whether the page has been accounted or not > > has been added as well. > > > > Review comments higly appreciated > > > > Tests > > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup > > 2. For the root cgroup tested performance impact with reaim > > > > > > +patch mmtom-08-may-2009 > > AIM9 1362.93 1338.17 > > Dbase 17457.75 16021.58 > > New Dbase 18070.18 16518.54 > > Shared 9681.85 8882.11 > > Compute 16197.79 15226.13 > > > Hmm, at first impression, I can't convice the numbers... > Just avoiding list_add/del makes programs _10%_ faster ? > Could you show changes in cpu cache-miss late if you can ? > (And why Aim9 goes bad ?) OK... I'll try but I am away on travel for 3 weeks :( you can try and run this as well > Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter > is not a problem here.. > > Could you show your .config and environment ? > When I trunst above numbers, it seems there is more optimization/ > prefetch point in usual path > > BTW, how the perfomance changes in children(not default) groups ? > I've not seen the impact of that. I'll try. > > 3. Tested accounting in root cgroup to make sure it looks sane and > > correct. > > > Not sure but swap and shmem case should be checked carefully.. > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > --- > > > > include/linux/page_cgroup.h | 10 ++++++++++ > > mm/memcontrol.c | 29 ++++++++++++++++++++++++++--- > > mm/page_cgroup.c | 1 - > > 3 files changed, 36 insertions(+), 4 deletions(-) > > > > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > > index 7339c7b..8b85752 100644 > > --- a/include/linux/page_cgroup.h > > +++ b/include/linux/page_cgroup.h > > @@ -26,6 +26,8 @@ enum { > > PCG_LOCK, /* page cgroup is locked */ > > PCG_CACHE, /* charged as cache */ > > PCG_USED, /* this object is in use. */ > > + PCG_ROOT, /* page belongs to root cgroup */ > > + PCG_ACCT, /* page has been accounted for */ > Reading codes, this PCG_ACCT should be PCG_AcctLRU. OK > > > }; > > > > #define TESTPCGFLAG(uname, lname) \ > > @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE) > > TESTPCGFLAG(Used, USED) > > CLEARPCGFLAG(Used, USED) > > > > +SETPCGFLAG(Root, ROOT) > > +CLEARPCGFLAG(Root, ROOT) > > +TESTPCGFLAG(Root, ROOT) > > + > > +SETPCGFLAG(Acct, ACCT) > > +CLEARPCGFLAG(Acct, ACCT) > > +TESTPCGFLAG(Acct, ACCT) > > + > > static inline int page_cgroup_nid(struct page_cgroup *pc) > > { > > return page_to_nid(pc->page); > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 9712ef7..18d2819 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -43,6 +43,7 @@ > > > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > > #define MEM_CGROUP_RECLAIM_RETRIES 5 > > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > /* Turned on only when memory cgroup is enabled && really_do_swap_account > > = 0 */ > > @@ -196,6 +197,10 @@ enum charge_type { > > #define PCGF_CACHE (1UL << PCG_CACHE) > > #define PCGF_USED (1UL << PCG_USED) > > #define PCGF_LOCK (1UL << PCG_LOCK) > > +/* Not used, but added here for completeness */ > > +#define PCGF_ROOT (1UL << PCG_ROOT) > > +#define PCGF_ACCT (1UL << PCG_ACCT) > > + > > static const unsigned long > > pcg_default_flags[NR_CHARGE_TYPE] = { > > PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ > > @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum > > lru_list lru) > > return; > > pc = lookup_page_cgroup(page); > > /* can happen while we handle swapcache. */ > > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > > + if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) > > return; > > /* > > * We don't check PCG_USED bit. It's cleared when the "page" is finally > > @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum > > lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > > mem = pc->mem_cgroup; > > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > > + ClearPageCgroupAcct(pc); > > + if (PageCgroupRoot(pc)) > > + return; > > list_del_init(&pc->lru); > > return; > > } > > > > @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, > > enum lru_list lru) > > * For making pc->mem_cgroup visible, insert smp_rmb() here. > > */ > > smp_rmb(); > > - /* unused page is not rotated. */ > > - if (!PageCgroupUsed(pc)) > > + /* unused or root page is not rotated. */ > > + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) > > return; > > mz = page_cgroup_zoneinfo(pc); > > list_move(&pc->lru, &mz->lists[lru]); > > @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum > > lru_list lru) > > > > mz = page_cgroup_zoneinfo(pc); > > MEM_CGROUP_ZSTAT(mz, lru) += 1; > > + SetPageCgroupAcct(pc); > > + if (PageCgroupRoot(pc)) > > + return; > > list_add(&pc->lru, &mz->lists[lru]); > > } > I think set/clear flag here adds race condtion....because pc->flags is > modfied by > pc->flags = pcg_dafault_flags[ctype] in commit_charge() > you have to modify above lines to be > > SetPageCgroupCache(pc) or some.. > ... > SetPageCgroupUsed(pc) Good Point > > Then, you can use set_bit() without lock_page_cgroup(). > (Currently, pc->flags is modified only under lock_page_cgroup(), so, > non atomic code is used.) OK.. I wonder if we can say, the _ACCT and _ROOT flags under zone->lru_lock. I have not seen the locks held under commit_charge fully, but we could potentially do that. Need some more thinking. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-15 18:16 ` Balbir Singh @ 2009-05-18 10:11 ` KAMEZAWA Hiroyuki 2009-05-18 10:45 ` Balbir Singh 2009-05-31 23:51 ` Balbir Singh 0 siblings, 2 replies; 30+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-05-18 10:11 UTC (permalink / raw) To: balbir Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro On Fri, 15 May 2009 23:46:39 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > > > Balbir Singh wrote: > > > Feature: Remove the overhead associated with the root cgroup > > > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > > > This patch changes the memory cgroup and removes the overhead associated > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we > > > can > > > no longer set a memory hard limit in the root cgroup. > > > > > > A new flag is used to track page_cgroup associated with the root cgroup > > > pages. A new flag to track whether the page has been accounted or not > > > has been added as well. > > > > > > Review comments higly appreciated > > > > > > Tests > > > > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup > > > 2. For the root cgroup tested performance impact with reaim > > > > > > > > > +patch mmtom-08-may-2009 > > > AIM9 1362.93 1338.17 > > > Dbase 17457.75 16021.58 > > > New Dbase 18070.18 16518.54 > > > Shared 9681.85 8882.11 > > > Compute 16197.79 15226.13 > > > > > Hmm, at first impression, I can't convice the numbers... > > Just avoiding list_add/del makes programs _10%_ faster ? > > Could you show changes in cpu cache-miss late if you can ? > > (And why Aim9 goes bad ?) > > OK... I'll try but I am away on travel for 3 weeks :( you can try and run > this as well > tested aim7 with some config. CPU: Xeon 3.1GHz/4Core x2 (8cpu) Memory: 32G HDD: Usual? Scsi disk (just 1 disk) (try_to_free_pages() etc...will never be called.) Multiuser config. #of tasks 1100 (near to peak on my host) 10runs. rc6mm1 score(Jobs/min) 44009.1 44844.5 44691.1 43981.9 44992.6 44544.9 44179.1 44283.0 44442.9 45033.8 average=44500 +patch 44656.8 44270.8 44706.7 44106.1 44467.6 44585.3 44167.0 44756.7 44853.9 44249.4 average=44482 Dbase config. #of tasks 25 rc6mm1 score (jobs/min) 11022.7 11018.9 11037.9 11003.8 11087.5 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071 +patch 10888.0 10973.7 10913.9 11000.0 10984.9 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962 Hmm, 1% improvement ? (I think this is reasonable score of the effect of this patch) Anyway, I'm afraid of difference between mine and your kernel config. plz enjoy your travel for now :) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-18 10:11 ` KAMEZAWA Hiroyuki @ 2009-05-18 10:45 ` Balbir Singh 2009-05-18 16:01 ` KAMEZAWA Hiroyuki 2009-05-31 23:51 ` Balbir Singh 1 sibling, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-05-18 10:45 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]: > On Fri, 15 May 2009 23:46:39 +0530 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > > > > > Balbir Singh wrote: > > > > Feature: Remove the overhead associated with the root cgroup > > > > > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > > > > > This patch changes the memory cgroup and removes the overhead associated > > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we > > > > can > > > > no longer set a memory hard limit in the root cgroup. > > > > > > > > A new flag is used to track page_cgroup associated with the root cgroup > > > > pages. A new flag to track whether the page has been accounted or not > > > > has been added as well. > > > > > > > > Review comments higly appreciated > > > > > > > > Tests > > > > > > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup > > > > 2. For the root cgroup tested performance impact with reaim > > > > > > > > > > > > +patch mmtom-08-may-2009 > > > > AIM9 1362.93 1338.17 > > > > Dbase 17457.75 16021.58 > > > > New Dbase 18070.18 16518.54 > > > > Shared 9681.85 8882.11 > > > > Compute 16197.79 15226.13 > > > > > > > Hmm, at first impression, I can't convice the numbers... > > > Just avoiding list_add/del makes programs _10%_ faster ? > > > Could you show changes in cpu cache-miss late if you can ? > > > (And why Aim9 goes bad ?) > > > > OK... I'll try but I am away on travel for 3 weeks :( you can try and run > > this as well > > > tested aim7 with some config. > > CPU: Xeon 3.1GHz/4Core x2 (8cpu) > Memory: 32G > HDD: Usual? Scsi disk (just 1 disk) > (try_to_free_pages() etc...will never be called.) > > Multiuser config. #of tasks 1100 (near to peak on my host) > > 10runs. > rc6mm1 score(Jobs/min) > 44009.1 44844.5 44691.1 43981.9 44992.6 > 44544.9 44179.1 44283.0 44442.9 45033.8 average=44500 > > +patch > 44656.8 44270.8 44706.7 44106.1 44467.6 > 44585.3 44167.0 44756.7 44853.9 44249.4 average=44482 > > Dbase config. #of tasks 25 > rc6mm1 score (jobs/min) > 11022.7 11018.9 11037.9 11003.8 11087.5 > 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071 > > +patch > 10888.0 10973.7 10913.9 11000.0 10984.9 > 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962 > > Hmm, 1% improvement ? > (I think this is reasonable score of the effect of this patch) > Thanks for the test, I have a 4 CPU system and I create 80 users, larger config shows larger difference at my end. I think even 1% is quite reasonable as you mentioned. If the patch looks fine, should we ask for larger testing by Andrew? > Anyway, I'm afraid of difference between mine and your kernel config. > plz enjoy your travel for now :) Sorry, I did not send you my .config, why do you think .config makes a difference? I think loading AIM makes the difference and I also made one other change to the aim tests. I run with "sync" linked to /bin/true and use tmpfs for temporary partition and 20*numnber of cpus for number of users. If required, I can still send out my .config to you. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-18 10:45 ` Balbir Singh @ 2009-05-18 16:01 ` KAMEZAWA Hiroyuki 2009-05-19 13:18 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-05-18 16:01 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro Balbir Singh wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 > 19:11:07]: > >> On Fri, 15 May 2009 23:46:39 +0530 >> Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> >> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 >> 02:45:03]: >> > >> > > Balbir Singh wrote: >> > > > Feature: Remove the overhead associated with the root cgroup >> > > > >> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> >> > > > >> > > > This patch changes the memory cgroup and removes the overhead >> associated >> > > > with LRU maintenance of all pages in the root cgroup. As a >> side-effect, we >> > > > can >> > > > no longer set a memory hard limit in the root cgroup. >> > > > >> > > > A new flag is used to track page_cgroup associated with the root >> cgroup >> > > > pages. A new flag to track whether the page has been accounted or >> not >> > > > has been added as well. >> > > > >> > > > Review comments higly appreciated >> > > > >> > > > Tests >> > > > >> > > > 1. Tested with allocate, touch and limit test case for a non-root >> cgroup >> > > > 2. For the root cgroup tested performance impact with reaim >> > > > >> > > > >> > > > +patch mmtom-08-may-2009 >> > > > AIM9 1362.93 1338.17 >> > > > Dbase 17457.75 16021.58 >> > > > New Dbase 18070.18 16518.54 >> > > > Shared 9681.85 8882.11 >> > > > Compute 16197.79 15226.13 >> > > > >> > > Hmm, at first impression, I can't convice the numbers... >> > > Just avoiding list_add/del makes programs _10%_ faster ? >> > > Could you show changes in cpu cache-miss late if you can ? >> > > (And why Aim9 goes bad ?) >> > >> > OK... I'll try but I am away on travel for 3 weeks :( you can try and >> run >> > this as well >> > >> tested aim7 with some config. >> >> CPU: Xeon 3.1GHz/4Core x2 (8cpu) >> Memory: 32G >> HDD: Usual? Scsi disk (just 1 disk) >> (try_to_free_pages() etc...will never be called.) >> >> Multiuser config. #of tasks 1100 (near to peak on my host) >> >> 10runs. >> rc6mm1 score(Jobs/min) >> 44009.1 44844.5 44691.1 43981.9 44992.6 >> 44544.9 44179.1 44283.0 44442.9 45033.8 average=44500 >> >> +patch >> 44656.8 44270.8 44706.7 44106.1 44467.6 >> 44585.3 44167.0 44756.7 44853.9 44249.4 average=44482 >> >> Dbase config. #of tasks 25 >> rc6mm1 score (jobs/min) >> 11022.7 11018.9 11037.9 11003.8 11087.5 >> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071 >> >> +patch >> 10888.0 10973.7 10913.9 11000.0 10984.9 >> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962 >> >> Hmm, 1% improvement ? >> (I think this is reasonable score of the effect of this patch) >> > > Thanks for the test, I have a 4 CPU system and I create 80 users, > larger config shows larger difference at my end. Sorry, above Dbase test was on 54 threads. I'll try 20*8=160 threads. > I think even 1% is > quite reasonable as you mentioned. If the patch looks fine, should we > ask for larger testing by Andrew? > Hmm, as you like. My interest is bugfix for swap leaking now. Because this change adds big special case, we need much tests, anyway. And please show _environment_ where benchmarks run. BTW, I wonder whetere we can have more improvements in this special case... >> Anyway, I'm afraid of difference between mine and your kernel config. >> plz enjoy your travel for now :) > > Sorry, I did not send you my .config, why do you think .config makes a > difference? I wanted to know what kind of DEBUG/TRACE config is on. and some others. > I think loading AIM makes the difference and I also made > one other change to the aim tests. I run with "sync" linked to > /bin/true and use tmpfs for temporary partition and 20*numnber of cpus > for number of users. > Is it usual method at using AIM ? (Sorry, I'm not sure). It seems to break AIM7's purpose of "measuring typical workload"... > If required, I can still send out my .config to you. > If you can, plz. (just for my interest ;) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-18 16:01 ` KAMEZAWA Hiroyuki @ 2009-05-19 13:18 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2009-05-19 13:18 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro [-- Attachment #1: Type: text/plain, Size: 4556 bytes --] * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-19 01:01:00]: > Balbir Singh wrote: > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 > > 19:11:07]: > > > >> On Fri, 15 May 2009 23:46:39 +0530 > >> Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > >> > >> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 > >> 02:45:03]: > >> > > >> > > Balbir Singh wrote: > >> > > > Feature: Remove the overhead associated with the root cgroup > >> > > > > >> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > >> > > > > >> > > > This patch changes the memory cgroup and removes the overhead > >> associated > >> > > > with LRU maintenance of all pages in the root cgroup. As a > >> side-effect, we > >> > > > can > >> > > > no longer set a memory hard limit in the root cgroup. > >> > > > > >> > > > A new flag is used to track page_cgroup associated with the root > >> cgroup > >> > > > pages. A new flag to track whether the page has been accounted or > >> not > >> > > > has been added as well. > >> > > > > >> > > > Review comments higly appreciated > >> > > > > >> > > > Tests > >> > > > > >> > > > 1. Tested with allocate, touch and limit test case for a non-root > >> cgroup > >> > > > 2. For the root cgroup tested performance impact with reaim > >> > > > > >> > > > > >> > > > +patch mmtom-08-may-2009 > >> > > > AIM9 1362.93 1338.17 > >> > > > Dbase 17457.75 16021.58 > >> > > > New Dbase 18070.18 16518.54 > >> > > > Shared 9681.85 8882.11 > >> > > > Compute 16197.79 15226.13 > >> > > > > >> > > Hmm, at first impression, I can't convice the numbers... > >> > > Just avoiding list_add/del makes programs _10%_ faster ? > >> > > Could you show changes in cpu cache-miss late if you can ? > >> > > (And why Aim9 goes bad ?) > >> > > >> > OK... I'll try but I am away on travel for 3 weeks :( you can try and > >> run > >> > this as well > >> > > >> tested aim7 with some config. > >> > >> CPU: Xeon 3.1GHz/4Core x2 (8cpu) > >> Memory: 32G > >> HDD: Usual? Scsi disk (just 1 disk) > >> (try_to_free_pages() etc...will never be called.) > >> > >> Multiuser config. #of tasks 1100 (near to peak on my host) > >> > >> 10runs. > >> rc6mm1 score(Jobs/min) > >> 44009.1 44844.5 44691.1 43981.9 44992.6 > >> 44544.9 44179.1 44283.0 44442.9 45033.8 average=44500 > >> > >> +patch > >> 44656.8 44270.8 44706.7 44106.1 44467.6 > >> 44585.3 44167.0 44756.7 44853.9 44249.4 average=44482 > >> > >> Dbase config. #of tasks 25 > >> rc6mm1 score (jobs/min) > >> 11022.7 11018.9 11037.9 11003.8 11087.5 > >> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071 > >> > >> +patch > >> 10888.0 10973.7 10913.9 11000.0 10984.9 > >> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962 > >> > >> Hmm, 1% improvement ? > >> (I think this is reasonable score of the effect of this patch) > >> > > > > Thanks for the test, I have a 4 CPU system and I create 80 users, > > larger config shows larger difference at my end. > Sorry, above Dbase test was on 54 threads. I'll try 20*8=160 threads. > cool! Thanks > > I think even 1% is > > quite reasonable as you mentioned. If the patch looks fine, should we > > ask for larger testing by Andrew? > > > Hmm, as you like. My interest is bugfix for swap leaking now. I've seen that too.. I think that has been going on for long and I am afraid it is hurting features like soft limit, but bug fixing is important. Hopefully we'll have a good solution soon. > Because this change adds big special case, we need much tests, anyway. > And please show _environment_ where benchmarks run. > BTW, I wonder whetere we can have more improvements in this special case... > > >> Anyway, I'm afraid of difference between mine and your kernel config. > >> plz enjoy your travel for now :) > > > > Sorry, I did not send you my .config, why do you think .config makes a > > difference? > I wanted to know what kind of DEBUG/TRACE config is on. and some others. > > > I think loading AIM makes the difference and I also made > > one other change to the aim tests. I run with "sync" linked to > > /bin/true and use tmpfs for temporary partition and 20*numnber of cpus > > for number of users. > > > Is it usual method at using AIM ? (Sorry, I'm not sure). > It seems to break AIM7's purpose of "measuring typical workload"... > No.. it is not.. but sync has a large overhead, so I use /bin/true. I can try without it and report back. > > If required, I can still send out my .config to you. > > > If you can, plz. (just for my interest ;) > Attached, please see -- Balbir [-- Attachment #2: config-2.6.30-rc4-mm1 --] [-- Type: text/plain, Size: 54827 bytes --] # # Automatically generated make config: don't edit # Linux kernel version: 2.6.30-rc4-mm1 # Wed May 13 17:51:31 2009 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ZONE_DMA32=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_USE_GENERIC_SMP_HELPERS=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_X86_TRAMPOLINE=y # CONFIG_KTIME_SCALAR is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_AUDIT is not set # # RCU Subsystem # # CONFIG_CLASSIC_RCU is not set CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=64 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_PREEMPT_RCU_TRACE is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y # CONFIG_USER_SCHED is not set CONFIG_CGROUP_SCHED=y CONFIG_CGROUPS=y CONFIG_CGROUP_DEBUG=y CONFIG_CGROUP_NS=y CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_DEVICE=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_CPUACCT=y CONFIG_RESOURCE_COUNTERS=y CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y CONFIG_MM_OWNER=y CONFIG_SYSFS_DEPRECATED=y CONFIG_SYSFS_DEPRECATED_V2=y CONFIG_RELAY=y CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set # CONFIG_NET_NS is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_RD_GZIP=y CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_PCI_QUIRKS=y # CONFIG_STRIP_ASM_SYMS is not set CONFIG_COMPAT_BRK=y # CONFIG_SLAB_ALLOCATOR is not set # CONFIG_SLUB_ALLOCATOR is not set CONFIG_SLQB_ALLOCATOR=y CONFIG_SLQB=y # CONFIG_SLOB is not set CONFIG_PROFILING=y CONFIG_TRACEPOINTS=y CONFIG_MARKERS=y CONFIG_OPROFILE=m CONFIG_OPROFILE_IBS=y CONFIG_HAVE_OPROFILE=y CONFIG_KPROBES=y CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_KRETPROBES=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_API_DEBUG=y # CONFIG_SLOW_WORK is not set # CONFIG_HAVE_GENERIC_DMA_COHERENT is not set CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_FORCE_LOAD=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_STOP_MACHINE=y CONFIG_UTRACE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_BSG is not set # CONFIG_BLK_DEV_INTEGRITY is not set CONFIG_BLOCK_COMPAT=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_PREEMPT_NOTIFIERS=y CONFIG_FREEZER=y # # Processor type and features # # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_X86_X2APIC=y # CONFIG_SPARSE_IRQ is not set CONFIG_X86_MPPARSE=y CONFIG_X86_EXTENDED_PLATFORM=y # CONFIG_X86_VSMP is not set # CONFIG_X86_UV is not set CONFIG_SCHED_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set CONFIG_MEMTEST=y # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set CONFIG_GENERIC_CPU=y CONFIG_X86_CPU=y CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_TSC=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=64 CONFIG_X86_DEBUGCTLMSR=y CONFIG_CPU_SUP_INTEL=y CONFIG_CPU_SUP_AMD=y CONFIG_CPU_SUP_CENTAUR=y CONFIG_X86_DS=y CONFIG_X86_PTRACE_BTS=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_DMI=y CONFIG_GART_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_AMD_IOMMU=y # CONFIG_AMD_IOMMU_STATS is not set CONFIG_SWIOTLB=y CONFIG_IOMMU_HELPER=y CONFIG_IOMMU_API=y # CONFIG_MAXSMP is not set CONFIG_NR_CPUS=32 CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y # CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_AMD=y CONFIG_X86_MCE_THRESHOLD=y # CONFIG_I8K is not set # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y # CONFIG_X86_CPU_DEBUG is not set CONFIG_ARCH_PHYS_ADDR_T_64BIT=y CONFIG_DIRECT_GBPAGES=y CONFIG_NUMA=y # CONFIG_K8_NUMA is not set CONFIG_X86_64_ACPI_NUMA=y CONFIG_NODES_SPAN_OTHER_NODES=y CONFIG_NUMA_EMU=y CONFIG_NODES_SHIFT=6 CONFIG_ARCH_SPARSEMEM_DEFAULT=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_MEMORY_PROBE=y CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000 CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set # CONFIG_DISCONTIGMEM_MANUAL is not set CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_HAVE_MEMORY_PRESENT=y CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_VMEMMAP=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG_SPARSE=y CONFIG_MEMORY_HOTREMOVE=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MIGRATION=y CONFIG_PHYS_ADDR_T_64BIT=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y CONFIG_UNEVICTABLE_LRU=y CONFIG_HAVE_MLOCK=y CONFIG_HAVE_MLOCKED_PAGE_BIT=y CONFIG_MMU_NOTIFIER=y CONFIG_KSM=m # CONFIG_X86_CHECK_BIOS_CORRUPTION is not set CONFIG_X86_RESERVE_LOW_64K=y CONFIG_MTRR=y CONFIG_MTRR_SANITIZER=y CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0 CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1 CONFIG_X86_PAT=y # CONFIG_EFI is not set CONFIG_SECCOMP=y # CONFIG_CC_STACKPROTECTOR is not set # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_SCHED_HRTICK is not set CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_PHYSICAL_START=0x200000 # CONFIG_RELOCATABLE is not set CONFIG_PHYSICAL_ALIGN=0x200000 CONFIG_HOTPLUG_CPU=y CONFIG_COMPAT_VDSO=y # CONFIG_CMDLINE_BOOL is not set CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y # # Power management and ACPI options # CONFIG_PM=y # CONFIG_PM_DEBUG is not set CONFIG_PM_SLEEP_SMP=y CONFIG_PM_SLEEP=y CONFIG_SUSPEND=y CONFIG_SUSPEND_FREEZER=y # CONFIG_HIBERNATION is not set CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_PROCFS=y CONFIG_ACPI_PROCFS_POWER=y CONFIG_ACPI_SYSFS_POWER=y CONFIG_ACPI_PROC_EVENT=y CONFIG_ACPI_AC=y CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_FAN=y CONFIG_ACPI_DOCK=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_HOTPLUG_CPU=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_NUMA=y # CONFIG_ACPI_CUSTOM_DSDT is not set CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set # CONFIG_ACPI_PCI_SLOT is not set CONFIG_X86_PM_TIMER=y CONFIG_ACPI_CONTAINER=y CONFIG_ACPI_HOTPLUG_MEMORY=y # CONFIG_ACPI_SBS is not set # # CPU Frequency scaling # CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_TABLE=y CONFIG_CPU_FREQ_DEBUG=y CONFIG_CPU_FREQ_STAT=y CONFIG_CPU_FREQ_STAT_DETAILS=y # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_POWERSAVE=y CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y # # CPUFreq processor drivers # CONFIG_X86_ACPI_CPUFREQ=y # CONFIG_X86_POWERNOW_K8 is not set # CONFIG_X86_SPEEDSTEP_CENTRINO is not set # CONFIG_X86_P4_CLOCKMOD is not set # # shared options # # CONFIG_X86_SPEEDSTEP_LIB is not set CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y # # Memory power savings # # CONFIG_I7300_IDLE is not set # # Bus options (PCI etc.) # CONFIG_PCI=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y # CONFIG_DMAR is not set CONFIG_INTR_REMAP=y CONFIG_PCIEPORTBUS=y CONFIG_PCIEAER=y CONFIG_PCIEASPM=y # CONFIG_PCIEASPM_DEBUG is not set CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y CONFIG_PCI_LEGACY=y # CONFIG_PCI_DEBUG is not set # CONFIG_PCI_STUB is not set # CONFIG_HT_IRQ is not set # CONFIG_PCI_IOV is not set CONFIG_ISA_DMA_API=y CONFIG_K8_NB=y # CONFIG_PCCARD is not set # CONFIG_HOTPLUG_PCI is not set # # Executable file formats / Emulations # CONFIG_BINFMT_ELF=y CONFIG_COMPAT_BINFMT_ELF=y # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set # CONFIG_HAVE_AOUT is not set # CONFIG_BINFMT_MISC is not set CONFIG_IA32_EMULATION=y CONFIG_IA32_AOUT=y CONFIG_COMPAT=y CONFIG_COMPAT_FOR_U64_ALIGNMENT=y CONFIG_SYSVIPC_COMPAT=y CONFIG_NET=y # # Networking options # CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set CONFIG_UNIX=y # CONFIG_NET_KEY is not set CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set CONFIG_IP_FIB_HASH=y CONFIG_IP_PNP=y CONFIG_IP_PNP_DHCP=y # CONFIG_IP_PNP_BOOTP is not set # CONFIG_IP_PNP_RARP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set # CONFIG_SYN_COOKIES is not set # CONFIG_INET_AH is not set # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set # CONFIG_INET_XFRM_TUNNEL is not set CONFIG_INET_TUNNEL=y # CONFIG_INET_XFRM_MODE_TRANSPORT is not set # CONFIG_INET_XFRM_MODE_TUNNEL is not set # CONFIG_INET_XFRM_MODE_BEET is not set # CONFIG_INET_LRO is not set CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y # CONFIG_TCP_CONG_ADVANCED is not set CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" # CONFIG_TCP_MD5SIG is not set CONFIG_IPV6=y # CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_ROUTER_PREF is not set # CONFIG_IPV6_OPTIMISTIC_DAD is not set # CONFIG_INET6_AH is not set # CONFIG_INET6_ESP is not set # CONFIG_INET6_IPCOMP is not set # CONFIG_IPV6_MIP6 is not set # CONFIG_INET6_XFRM_TUNNEL is not set # CONFIG_INET6_TUNNEL is not set # CONFIG_INET6_XFRM_MODE_TRANSPORT is not set # CONFIG_INET6_XFRM_MODE_TUNNEL is not set # CONFIG_INET6_XFRM_MODE_BEET is not set # CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set CONFIG_IPV6_SIT=y CONFIG_IPV6_NDISC_NODETYPE=y # CONFIG_IPV6_TUNNEL is not set # CONFIG_IPV6_MULTIPLE_TABLES is not set # CONFIG_IPV6_MROUTE is not set # CONFIG_NETWORK_SECMARK is not set # CONFIG_NETFILTER is not set # CONFIG_IP_DCCP is not set # CONFIG_IP_SCTP is not set # CONFIG_TIPC is not set # CONFIG_ATM is not set CONFIG_STP=m CONFIG_BRIDGE=m # CONFIG_NET_DSA is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set CONFIG_LLC=m # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # CONFIG_PHONET is not set CONFIG_NET_SCHED=y # # Queueing/Scheduling # CONFIG_NET_SCH_CBQ=m # CONFIG_NET_SCH_HTB is not set # CONFIG_NET_SCH_HFSC is not set # CONFIG_NET_SCH_PRIO is not set # CONFIG_NET_SCH_MULTIQ is not set # CONFIG_NET_SCH_RED is not set # CONFIG_NET_SCH_SFQ is not set # CONFIG_NET_SCH_TEQL is not set # CONFIG_NET_SCH_TBF is not set # CONFIG_NET_SCH_GRED is not set # CONFIG_NET_SCH_DSMARK is not set # CONFIG_NET_SCH_NETEM is not set # CONFIG_NET_SCH_DRR is not set # # Classification # CONFIG_NET_CLS=y # CONFIG_NET_CLS_BASIC is not set # CONFIG_NET_CLS_TCINDEX is not set # CONFIG_NET_CLS_ROUTE4 is not set # CONFIG_NET_CLS_FW is not set # CONFIG_NET_CLS_U32 is not set # CONFIG_NET_CLS_RSVP is not set # CONFIG_NET_CLS_RSVP6 is not set # CONFIG_NET_CLS_FLOW is not set CONFIG_NET_CLS_CGROUP=y # CONFIG_NET_EMATCH is not set # CONFIG_NET_CLS_ACT is not set CONFIG_NET_SCH_FIFO=y # CONFIG_DCB is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_NET_TCPPROBE is not set # CONFIG_NET_DROP_MONITOR is not set # CONFIG_HAMRADIO is not set # CONFIG_CAN is not set # CONFIG_IRDA is not set # CONFIG_BT is not set # CONFIG_AF_RXRPC is not set CONFIG_WIRELESS=y # CONFIG_CFG80211 is not set # CONFIG_WIRELESS_OLD_REGULATORY is not set # CONFIG_WIRELESS_EXT is not set # CONFIG_LIB80211 is not set # CONFIG_MAC80211 is not set CONFIG_MAC80211_DEFAULT_PS_VALUE=0 # CONFIG_WIMAX is not set # CONFIG_RFKILL is not set # CONFIG_NET_9P is not set # # Device Drivers # # # Generic Driver Options # CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_FIRMWARE_IN_KERNEL=y CONFIG_EXTRA_FIRMWARE="" # CONFIG_DEBUG_DRIVER is not set # CONFIG_DEBUG_DEVRES is not set # CONFIG_SYS_HYPERVISOR is not set CONFIG_CONNECTOR=y CONFIG_PROC_EVENTS=y # CONFIG_MTD is not set # CONFIG_PARPORT is not set CONFIG_PNP=y CONFIG_PNP_DEBUG_MESSAGES=y # # Protocols # CONFIG_PNPACPI=y CONFIG_BLK_DEV=y CONFIG_BLK_DEV_FD=y # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set # CONFIG_BLK_DEV_NBD is not set # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=4096 # CONFIG_BLK_DEV_XIP is not set # CONFIG_CDROM_PKTCDVD is not set # CONFIG_ATA_OVER_ETH is not set # CONFIG_VIRTIO_BLK is not set # CONFIG_BLK_DEV_HD is not set CONFIG_MISC_DEVICES=y # CONFIG_IBM_ASM is not set # CONFIG_PHANTOM is not set # CONFIG_SGI_IOC4 is not set # CONFIG_TIFM_CORE is not set # CONFIG_ICS932S401 is not set # CONFIG_ENCLOSURE_SERVICES is not set # CONFIG_HP_ILO is not set # CONFIG_ISL29003 is not set # CONFIG_C2PORT is not set # # EEPROM support # # CONFIG_EEPROM_AT24 is not set # CONFIG_EEPROM_LEGACY is not set # CONFIG_EEPROM_MAX6875 is not set # CONFIG_EEPROM_93CX6 is not set CONFIG_HAVE_IDE=y CONFIG_IDE=y # # Please see Documentation/ide/ide.txt for help/info on IDE drives # CONFIG_IDE_XFER_MODE=y CONFIG_IDE_TIMINGS=y CONFIG_IDE_ATAPI=y # CONFIG_BLK_DEV_IDE_SATA is not set CONFIG_IDE_GD=y CONFIG_IDE_GD_ATA=y # CONFIG_IDE_GD_ATAPI is not set CONFIG_BLK_DEV_IDECD=y CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y # CONFIG_BLK_DEV_IDETAPE is not set CONFIG_BLK_DEV_IDEACPI=y # CONFIG_IDE_TASK_IOCTL is not set CONFIG_IDE_PROC_FS=y # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y # CONFIG_BLK_DEV_PLATFORM is not set # CONFIG_BLK_DEV_CMD640 is not set # CONFIG_BLK_DEV_IDEPNP is not set CONFIG_BLK_DEV_IDEDMA_SFF=y # # PCI IDE chipsets support # CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_PCIBUS_ORDER=y # CONFIG_BLK_DEV_OFFBOARD is not set # CONFIG_BLK_DEV_GENERIC is not set # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_RZ1000 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set CONFIG_BLK_DEV_AMD74XX=y CONFIG_BLK_DEV_ATIIXP=y # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_JMICRON is not set # CONFIG_BLK_DEV_SC1200 is not set CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_IT8172 is not set # CONFIG_BLK_DEV_IT8213 is not set # CONFIG_BLK_DEV_IT821X is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set CONFIG_BLK_DEV_PDC202XX_NEW=y # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SIS5513 is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_BLK_DEV_TC86C001 is not set CONFIG_BLK_DEV_IDEDMA=y # # SCSI device support # CONFIG_RAID_ATTRS=m CONFIG_SCSI=y CONFIG_SCSI_DMA=y # CONFIG_SCSI_TGT is not set CONFIG_SCSI_NETLINK=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y # CONFIG_CHR_DEV_ST is not set # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y # CONFIG_BLK_DEV_SR_VENDOR is not set CONFIG_CHR_DEV_SG=y # CONFIG_CHR_DEV_SCH is not set # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # # CONFIG_SCSI_MULTI_LUN is not set CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # CONFIG_SCSI_SCAN_ASYNC is not set CONFIG_SCSI_WAIT_SCAN=m # # SCSI Transports # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_ISCSI_ATTRS is not set CONFIG_SCSI_SAS_ATTRS=m # CONFIG_SCSI_SAS_LIBSAS is not set # CONFIG_SCSI_SRP_ATTRS is not set CONFIG_SCSI_LOWLEVEL=y # CONFIG_ISCSI_TCP is not set # CONFIG_SCSI_CXGB3_ISCSI is not set # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set CONFIG_SCSI_AIC79XX=y CONFIG_AIC79XX_CMDS_PER_DEVICE=32 CONFIG_AIC79XX_RESET_DELAY_MS=4000 # CONFIG_AIC79XX_DEBUG_ENABLE is not set CONFIG_AIC79XX_DEBUG_MASK=0 # CONFIG_AIC79XX_REG_PRETTY_PRINT is not set # CONFIG_SCSI_AIC94XX is not set # CONFIG_SCSI_MVSAS is not set # CONFIG_SCSI_DPT_I2O is not set # CONFIG_SCSI_ADVANSYS is not set # CONFIG_SCSI_ARCMSR is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set # CONFIG_MEGARAID_SAS is not set # CONFIG_SCSI_MPT2SAS is not set # CONFIG_SCSI_HPTIOP is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_LIBFC is not set # CONFIG_LIBFCOE is not set # CONFIG_FCOE is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_STEX is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_QLOGIC_1280 is not set # CONFIG_SCSI_QLA_FC is not set # CONFIG_SCSI_QLA_ISCSI is not set # CONFIG_SCSI_LPFC is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set # CONFIG_SCSI_SRP is not set # CONFIG_SCSI_DH is not set # CONFIG_SCSI_OSD_INITIATOR is not set CONFIG_ATA=m # CONFIG_ATA_NONSTANDARD is not set CONFIG_ATA_ACPI=y CONFIG_SATA_PMP=y CONFIG_SATA_AHCI=m # CONFIG_SATA_SIL24 is not set CONFIG_ATA_SFF=y CONFIG_SATA_SVW=m CONFIG_ATA_PIIX=m # CONFIG_SATA_MV is not set CONFIG_SATA_NV=m # CONFIG_PDC_ADMA is not set # CONFIG_SATA_QSTOR is not set # CONFIG_SATA_PROMISE is not set # CONFIG_SATA_SX4 is not set CONFIG_SATA_SIL=m # CONFIG_SATA_SIS is not set # CONFIG_SATA_ULI is not set CONFIG_SATA_VIA=m # CONFIG_SATA_VITESSE is not set # CONFIG_SATA_INIC162X is not set # CONFIG_PATA_ACPI is not set # CONFIG_PATA_ALI is not set # CONFIG_PATA_AMD is not set # CONFIG_PATA_ARTOP is not set # CONFIG_PATA_ATIIXP is not set # CONFIG_PATA_CMD640_PCI is not set # CONFIG_PATA_CMD64X is not set # CONFIG_PATA_CS5520 is not set # CONFIG_PATA_CS5530 is not set # CONFIG_PATA_CYPRESS is not set # CONFIG_PATA_EFAR is not set # CONFIG_ATA_GENERIC is not set # CONFIG_PATA_HPT366 is not set # CONFIG_PATA_HPT37X is not set # CONFIG_PATA_HPT3X2N is not set # CONFIG_PATA_HPT3X3 is not set # CONFIG_PATA_IT821X is not set # CONFIG_PATA_IT8213 is not set # CONFIG_PATA_JMICRON is not set # CONFIG_PATA_TRIFLEX is not set # CONFIG_PATA_MARVELL is not set # CONFIG_PATA_MPIIX is not set # CONFIG_PATA_OLDPIIX is not set # CONFIG_PATA_NETCELL is not set # CONFIG_PATA_NINJA32 is not set # CONFIG_PATA_NS87410 is not set # CONFIG_PATA_NS87415 is not set # CONFIG_PATA_OPTI is not set # CONFIG_PATA_OPTIDMA is not set # CONFIG_PATA_PDC_OLD is not set # CONFIG_PATA_RADISYS is not set # CONFIG_PATA_RZ1000 is not set # CONFIG_PATA_SC1200 is not set # CONFIG_PATA_SERVERWORKS is not set # CONFIG_PATA_PDC2027X is not set # CONFIG_PATA_SIL680 is not set # CONFIG_PATA_SIS is not set # CONFIG_PATA_VIA is not set # CONFIG_PATA_WINBOND is not set # CONFIG_PATA_SCH is not set CONFIG_MD=y CONFIG_BLK_DEV_MD=m CONFIG_MD_LINEAR=m CONFIG_MD_RAID0=m CONFIG_MD_RAID1=m # CONFIG_MD_RAID10 is not set CONFIG_MD_RAID456=m CONFIG_MD_RAID6_PQ=m CONFIG_MD_MULTIPATH=m # CONFIG_MD_FAULTY is not set CONFIG_BLK_DEV_DM=y # CONFIG_DM_DEBUG is not set # CONFIG_DM_CRYPT is not set # CONFIG_DM_SNAPSHOT is not set # CONFIG_DM_MIRROR is not set # CONFIG_DM_ZERO is not set # CONFIG_DM_MULTIPATH is not set # CONFIG_DM_DELAY is not set # CONFIG_DM_UEVENT is not set CONFIG_FUSION=y CONFIG_FUSION_SPI=m CONFIG_FUSION_FC=m CONFIG_FUSION_SAS=m CONFIG_FUSION_MAX_SGE=128 CONFIG_FUSION_CTL=m # CONFIG_FUSION_LOGGING is not set # # IEEE 1394 (FireWire) support # # # Enable only one of the two stacks, unless you know what you are doing # # CONFIG_FIREWIRE is not set # CONFIG_IEEE1394 is not set # CONFIG_I2O is not set # CONFIG_MACINTOSH_DRIVERS is not set CONFIG_NETDEVICES=y CONFIG_COMPAT_NET_DEV_OPS=y CONFIG_DUMMY=m CONFIG_BONDING=m # CONFIG_MACVLAN is not set CONFIG_EQUALIZER=m CONFIG_TUN=y CONFIG_VETH=m # CONFIG_NET_SB1000 is not set # CONFIG_ARCNET is not set CONFIG_PHYLIB=y # # MII PHY device drivers # # CONFIG_MARVELL_PHY is not set # CONFIG_DAVICOM_PHY is not set # CONFIG_QSEMI_PHY is not set # CONFIG_LXT_PHY is not set # CONFIG_CICADA_PHY is not set # CONFIG_VITESSE_PHY is not set # CONFIG_SMSC_PHY is not set # CONFIG_BROADCOM_PHY is not set # CONFIG_ICPLUS_PHY is not set # CONFIG_REALTEK_PHY is not set # CONFIG_NATIONAL_PHY is not set # CONFIG_STE10XP is not set # CONFIG_LSI_ET1011C_PHY is not set # CONFIG_FIXED_PHY is not set # CONFIG_MDIO_BITBANG is not set CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_HAPPYMEAL is not set # CONFIG_SUNGEM is not set # CONFIG_CASSINI is not set CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y # CONFIG_TYPHOON is not set # CONFIG_ETHOC is not set # CONFIG_DNET is not set CONFIG_NET_TULIP=y # CONFIG_DE2104X is not set CONFIG_TULIP=y # CONFIG_TULIP_MWI is not set # CONFIG_TULIP_MMIO is not set # CONFIG_TULIP_NAPI is not set # CONFIG_DE4X5 is not set # CONFIG_WINBOND_840 is not set # CONFIG_DM9102 is not set # CONFIG_ULI526X is not set # CONFIG_HP100 is not set # CONFIG_IBM_NEW_EMAC_ZMII is not set # CONFIG_IBM_NEW_EMAC_RGMII is not set # CONFIG_IBM_NEW_EMAC_TAH is not set # CONFIG_IBM_NEW_EMAC_EMAC4 is not set # CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set # CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set # CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set CONFIG_AMD8111_ETH=y # CONFIG_ADAPTEC_STARFIRE is not set CONFIG_B44=y CONFIG_B44_PCI_AUTOSELECT=y CONFIG_B44_PCICORE_AUTOSELECT=y CONFIG_B44_PCI=y CONFIG_FORCEDETH=y # CONFIG_FORCEDETH_NAPI is not set CONFIG_E100=y # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set # CONFIG_NE2K_PCI is not set CONFIG_8139CP=y CONFIG_8139TOO=y # CONFIG_8139TOO_PIO is not set # CONFIG_8139TOO_TUNE_TWISTER is not set # CONFIG_8139TOO_8129 is not set # CONFIG_8139_OLD_RX_RESET is not set # CONFIG_R6040 is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SMSC9420 is not set # CONFIG_SUNDANCE is not set # CONFIG_TLAN is not set # CONFIG_VIA_RHINE is not set # CONFIG_SC92031 is not set # CONFIG_ATL2 is not set CONFIG_NETDEV_1000=y # CONFIG_ACENIC is not set # CONFIG_DL2K is not set CONFIG_E1000=y CONFIG_E1000E=y # CONFIG_IP1000 is not set # CONFIG_IGB is not set # CONFIG_IGBVF is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SIS190 is not set # CONFIG_SKGE is not set # CONFIG_SKY2 is not set # CONFIG_VIA_VELOCITY is not set CONFIG_TIGON3=y CONFIG_BNX2=y # CONFIG_QLA3XXX is not set # CONFIG_ATL1 is not set # CONFIG_ATL1E is not set # CONFIG_ATL1C is not set # CONFIG_JME is not set CONFIG_NETDEV_10000=y # CONFIG_CHELSIO_T1 is not set CONFIG_CHELSIO_T3_DEPENDS=y # CONFIG_CHELSIO_T3 is not set # CONFIG_ENIC is not set # CONFIG_IXGBE is not set # CONFIG_IXGB is not set CONFIG_S2IO=m # CONFIG_VXGE is not set # CONFIG_MYRI10GE is not set # CONFIG_NETXEN_NIC is not set # CONFIG_NIU is not set # CONFIG_MLX4_EN is not set # CONFIG_MLX4_CORE is not set # CONFIG_TEHUTI is not set # CONFIG_BNX2X is not set # CONFIG_QLGE is not set # CONFIG_SFC is not set # CONFIG_BE2NET is not set # CONFIG_TR is not set # # Wireless LAN # # CONFIG_WLAN_PRE80211 is not set # CONFIG_WLAN_80211 is not set # # Enable WiMAX (Networking options) to see the WiMAX drivers # # # USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NET_FC is not set CONFIG_NETCONSOLE=y # CONFIG_NETCONSOLE_DYNAMIC is not set CONFIG_NETPOLL=y # CONFIG_NETPOLL_TRAP is not set CONFIG_NET_POLL_CONTROLLER=y CONFIG_VIRTIO_NET=m # CONFIG_ISDN is not set # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y CONFIG_INPUT_FF_MEMLESS=m # CONFIG_INPUT_POLLDEV is not set # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set CONFIG_INPUT_EVDEV=y # CONFIG_INPUT_EVBUG is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set # CONFIG_KEYBOARD_STOWAWAY is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y CONFIG_MOUSE_PS2_ALPS=y CONFIG_MOUSE_PS2_LOGIPS2PP=y CONFIG_MOUSE_PS2_SYNAPTICS=y CONFIG_MOUSE_PS2_LIFEBOOK=y CONFIG_MOUSE_PS2_TRACKPOINT=y # CONFIG_MOUSE_PS2_ELANTECH is not set # CONFIG_MOUSE_PS2_TOUCHKIT is not set # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_APPLETOUCH is not set # CONFIG_MOUSE_BCM5974 is not set # CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TABLET is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set # # Hardware I/O ports # CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set # CONFIG_GAMEPORT is not set # # Character devices # CONFIG_VT=y CONFIG_CONSOLE_TRANSLATIONS=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_VT_HW_CONSOLE_BINDING is not set CONFIG_DEVKMEM=y # CONFIG_SERIAL_NONSTANDARD is not set # CONFIG_NOZOMI is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_SERIAL_8250_PCI=y CONFIG_SERIAL_8250_PNP=y CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_8250_RUNTIME_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y # CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 CONFIG_HVC_DRIVER=y CONFIG_VIRTIO_CONSOLE=m CONFIG_IPMI_HANDLER=m # CONFIG_IPMI_PANIC_EVENT is not set CONFIG_IPMI_DEVICE_INTERFACE=m CONFIG_IPMI_SI=m CONFIG_IPMI_WATCHDOG=m CONFIG_IPMI_POWEROFF=m CONFIG_HW_RANDOM=y # CONFIG_HW_RANDOM_TIMERIOMEM is not set CONFIG_HW_RANDOM_INTEL=y CONFIG_HW_RANDOM_AMD=y # CONFIG_HW_RANDOM_VIRTIO is not set # CONFIG_NVRAM is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # CONFIG_MWAVE is not set # CONFIG_PC8736x_GPIO is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 CONFIG_HPET=y CONFIG_HPET_MMAP=y # CONFIG_HANGCHECK_TIMER is not set CONFIG_TCG_TPM=y CONFIG_TCG_TIS=y # CONFIG_TCG_NSC is not set # CONFIG_TCG_ATMEL is not set # CONFIG_TCG_INFINEON is not set # CONFIG_TELCLOCK is not set CONFIG_DEVPORT=y CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y # CONFIG_I2C_CHARDEV is not set CONFIG_I2C_HELPER_AUTO=y CONFIG_I2C_ALGOBIT=y # # I2C Hardware Bus support # # # PC SMBus host controller drivers # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_I801 is not set # CONFIG_I2C_ISCH is not set # CONFIG_I2C_PIIX4 is not set # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # # I2C system bus drivers (mostly embedded / system-on-chip) # # CONFIG_I2C_OCORES is not set # CONFIG_I2C_SIMTEC is not set # # External I2C/SMBus adapter drivers # # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_TAOS_EVM is not set # CONFIG_I2C_TINY_USB is not set # # Graphics adapter I2C/DDC channel drivers # # CONFIG_I2C_VOODOO3 is not set # # Other I2C/SMBus bus drivers # # CONFIG_I2C_PCA_PLATFORM is not set # CONFIG_I2C_STUB is not set # # Miscellaneous I2C Chip support # # CONFIG_DS1682 is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_PCF8575 is not set # CONFIG_SENSORS_PCA9539 is not set # CONFIG_SENSORS_TSL2550 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # CONFIG_SPI is not set # # PPS support # # CONFIG_PPS is not set CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y # CONFIG_GPIOLIB is not set # CONFIG_W1 is not set CONFIG_POWER_SUPPLY=y # CONFIG_POWER_SUPPLY_DEBUG is not set # CONFIG_PDA_POWER is not set # CONFIG_BATTERY_DS2760 is not set # CONFIG_BATTERY_BQ27x00 is not set CONFIG_HWMON=m # CONFIG_HWMON_VID is not set # CONFIG_SENSORS_ABITUGURU is not set # CONFIG_SENSORS_ABITUGURU3 is not set # CONFIG_SENSORS_AD7414 is not set # CONFIG_SENSORS_AD7418 is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1029 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ADM9240 is not set # CONFIG_SENSORS_ADT7462 is not set # CONFIG_SENSORS_ADT7470 is not set # CONFIG_SENSORS_ADT7473 is not set # CONFIG_SENSORS_ADT7475 is not set # CONFIG_SENSORS_K8TEMP is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_ATK0110 is not set # CONFIG_SENSORS_ATXP1 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_I5K_AMB is not set # CONFIG_SENSORS_F71805F is not set # CONFIG_SENSORS_F71882FG is not set # CONFIG_SENSORS_F75375S is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_FSCPOS is not set # CONFIG_SENSORS_FSCHMD is not set # CONFIG_SENSORS_G760A is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_GL520SM is not set CONFIG_SENSORS_CORETEMP=m # CONFIG_SENSORS_IBMAEM is not set # CONFIG_SENSORS_IBMPEX is not set # CONFIG_SENSORS_IT87 is not set # CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_LM92 is not set # CONFIG_SENSORS_LM93 is not set # CONFIG_SENSORS_LTC4215 is not set # CONFIG_SENSORS_LTC4245 is not set # CONFIG_SENSORS_LM95241 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_MAX6650 is not set # CONFIG_SENSORS_PC87360 is not set # CONFIG_SENSORS_PC87427 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_SIS5595 is not set # CONFIG_SENSORS_DME1737 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_SMSC47M192 is not set # CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_ADS7828 is not set # CONFIG_SENSORS_THMC50 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_VT1211 is not set # CONFIG_SENSORS_VT8231 is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83791D is not set # CONFIG_SENSORS_W83792D is not set # CONFIG_SENSORS_W83793 is not set # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83L786NG is not set # CONFIG_SENSORS_W83627HF is not set # CONFIG_SENSORS_W83627EHF is not set # CONFIG_SENSORS_HDAPS is not set # CONFIG_SENSORS_LIS3LV02D is not set # CONFIG_SENSORS_APPLESMC is not set # CONFIG_HWMON_DEBUG_CHIP is not set CONFIG_THERMAL=y # CONFIG_WATCHDOG is not set CONFIG_SSB_POSSIBLE=y # # Sonics Silicon Backplane # CONFIG_SSB=y CONFIG_SSB_SPROM=y CONFIG_SSB_PCIHOST_POSSIBLE=y CONFIG_SSB_PCIHOST=y # CONFIG_SSB_B43_PCI_BRIDGE is not set # CONFIG_SSB_DEBUG is not set CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y CONFIG_SSB_DRIVER_PCICORE=y # # Multifunction device drivers # # CONFIG_MFD_CORE is not set # CONFIG_MFD_SM501 is not set # CONFIG_HTC_PASIC3 is not set # CONFIG_TWL4030_CORE is not set # CONFIG_MFD_TMIO is not set # CONFIG_PMIC_DA903X is not set # CONFIG_MFD_WM8400 is not set # CONFIG_MFD_WM8350_I2C is not set # CONFIG_MFD_PCF50633 is not set # CONFIG_REGULATOR is not set # # Multimedia devices # # # Multimedia core support # # CONFIG_VIDEO_DEV is not set # CONFIG_DVB_CORE is not set # CONFIG_VIDEO_MEDIA is not set # # Multimedia drivers # CONFIG_DAB=y # CONFIG_USB_DABUSB is not set # # Graphics support # CONFIG_AGP=y CONFIG_AGP_AMD64=y CONFIG_AGP_INTEL=y # CONFIG_AGP_SIS is not set # CONFIG_AGP_VIA is not set # CONFIG_DRM is not set # CONFIG_VGASTATE is not set # CONFIG_VIDEO_OUTPUT_CONTROL is not set CONFIG_FB=y # CONFIG_FIRMWARE_EDID is not set CONFIG_FB_DDC=y CONFIG_FB_BOOT_VESA_SUPPORT=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y # CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set # CONFIG_FB_SYS_FILLRECT is not set # CONFIG_FB_SYS_COPYAREA is not set # CONFIG_FB_SYS_IMAGEBLIT is not set # CONFIG_FB_FOREIGN_ENDIAN is not set # CONFIG_FB_SYS_FOPS is not set # CONFIG_FB_SVGALIB is not set # CONFIG_FB_MACMODES is not set # CONFIG_FB_BACKLIGHT is not set CONFIG_FB_MODE_HELPERS=y # CONFIG_FB_TILEBLITTING is not set # # Frame buffer hardware drivers # # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set # CONFIG_FB_ARC is not set # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set # CONFIG_FB_UVESA is not set # CONFIG_FB_VESA is not set # CONFIG_FB_N411 is not set # CONFIG_FB_HGA is not set # CONFIG_FB_S1D13XXX is not set # CONFIG_FB_NVIDIA is not set # CONFIG_FB_RIVA is not set # CONFIG_FB_LE80578 is not set CONFIG_FB_INTEL=y # CONFIG_FB_INTEL_DEBUG is not set CONFIG_FB_INTEL_I2C=y # CONFIG_FB_MATROX is not set # CONFIG_FB_RADEON is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_S3 is not set # CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_VIA is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_VT8623 is not set # CONFIG_FB_TRIDENT is not set # CONFIG_FB_ARK is not set # CONFIG_FB_PM3 is not set # CONFIG_FB_CARMINE is not set # CONFIG_FB_GEODE is not set # CONFIG_FB_VIRTUAL is not set # CONFIG_FB_METRONOME is not set # CONFIG_FB_MB862XX is not set # CONFIG_FB_BROADSHEET is not set # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Display device support # # CONFIG_DISPLAY_SUPPORT is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_VGACON_SOFT_SCROLLBACK=y CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=256 CONFIG_DUMMY_CONSOLE=y # CONFIG_FRAMEBUFFER_CONSOLE is not set CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y CONFIG_SOUND=y CONFIG_SOUND_OSS_CORE=y # CONFIG_SND is not set CONFIG_SOUND_PRIME=y # CONFIG_SOUND_OSS is not set CONFIG_HID_SUPPORT=y CONFIG_HID=y CONFIG_HID_DEBUG=y # CONFIG_HIDRAW is not set # # USB Input Devices # CONFIG_USB_HID=m # CONFIG_HID_PID is not set # CONFIG_USB_HIDDEV is not set # # Special HID drivers # CONFIG_HID_A4TECH=m CONFIG_HID_APPLE=m CONFIG_HID_BELKIN=m CONFIG_HID_CHERRY=m CONFIG_HID_CHICONY=m CONFIG_HID_CYPRESS=m # CONFIG_DRAGONRISE_FF is not set CONFIG_HID_EZKEY=m CONFIG_HID_KYE=m CONFIG_HID_GYRATION=m CONFIG_HID_KENSINGTON=m CONFIG_HID_LOGITECH=m # CONFIG_LOGITECH_FF is not set # CONFIG_LOGIRUMBLEPAD2_FF is not set CONFIG_HID_MICROSOFT=m CONFIG_HID_MONTEREY=m CONFIG_HID_NTRIG=m CONFIG_HID_PANTHERLORD=m # CONFIG_PANTHERLORD_FF is not set CONFIG_HID_PETALYNX=m CONFIG_HID_SAMSUNG=m CONFIG_HID_SONY=m CONFIG_HID_SUNPLUS=m # CONFIG_GREENASIA_FF is not set CONFIG_HID_TOPSEED=m # CONFIG_THRUSTMASTER_FF is not set # CONFIG_ZEROPLUS_FF is not set CONFIG_USB_SUPPORT=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y CONFIG_USB=m # CONFIG_USB_DEBUG is not set # CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set # # Miscellaneous USB options # CONFIG_USB_DEVICE_CLASS=y # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_SUSPEND is not set # CONFIG_USB_OTG is not set CONFIG_USB_MON=m # CONFIG_USB_WUSB is not set # CONFIG_USB_WUSB_CBAF is not set # # USB Host Controller Drivers # # CONFIG_USB_C67X00_HCD is not set # CONFIG_USB_XHCI_HCD is not set CONFIG_USB_EHCI_HCD=m # CONFIG_USB_EHCI_ROOT_HUB_TT is not set # CONFIG_USB_EHCI_TT_NEWSCHED is not set # CONFIG_USB_OXU210HP_HCD is not set # CONFIG_USB_ISP116X_HCD is not set # CONFIG_USB_ISP1760_HCD is not set CONFIG_USB_OHCI_HCD=m # CONFIG_USB_OHCI_HCD_SSB is not set # CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set # CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set CONFIG_USB_OHCI_LITTLE_ENDIAN=y CONFIG_USB_UHCI_HCD=m # CONFIG_USB_SL811_HCD is not set # CONFIG_USB_R8A66597_HCD is not set # CONFIG_USB_WHCI_HCD is not set # CONFIG_USB_HWA_HCD is not set # # Enable Host or Gadget support to see Inventra options # # # USB Device Class drivers # # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set # CONFIG_USB_WDM is not set # CONFIG_USB_TMC is not set # # NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may # # # also be needed; see USB_STORAGE Help for more info # CONFIG_USB_STORAGE=m # CONFIG_USB_STORAGE_DEBUG is not set # CONFIG_USB_STORAGE_DATAFAB is not set # CONFIG_USB_STORAGE_FREECOM is not set # CONFIG_USB_STORAGE_ISD200 is not set # CONFIG_USB_STORAGE_USBAT is not set # CONFIG_USB_STORAGE_SDDR09 is not set # CONFIG_USB_STORAGE_SDDR55 is not set # CONFIG_USB_STORAGE_JUMPSHOT is not set # CONFIG_USB_STORAGE_ALAUDA is not set # CONFIG_USB_STORAGE_ONETOUCH is not set # CONFIG_USB_STORAGE_KARMA is not set # CONFIG_USB_STORAGE_CYPRESS_ATACB is not set # CONFIG_USB_LIBUSUAL is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set # # USB port drivers # # CONFIG_USB_SERIAL is not set # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_ADUTUX is not set # CONFIG_USB_SEVSEG is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_BERRY_CHARGE is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYPRESS_CY7C63 is not set # CONFIG_USB_CYTHERM is not set # CONFIG_USB_IDMOUSE is not set # CONFIG_USB_FTDI_ELAN is not set # CONFIG_USB_APPLEDISPLAY is not set # CONFIG_USB_SISUSBVGA is not set # CONFIG_USB_LD is not set # CONFIG_USB_TRANCEVIBRATOR is not set # CONFIG_USB_IOWARRIOR is not set # CONFIG_USB_ISIGHTFW is not set # CONFIG_USB_VST is not set # CONFIG_USB_GADGET is not set # # OTG and related infrastructure # # CONFIG_NOP_USB_XCEIV is not set # CONFIG_UWB is not set # CONFIG_MMC is not set # CONFIG_MEMSTICK is not set # CONFIG_NEW_LEDS is not set # CONFIG_ACCESSIBILITY is not set # CONFIG_INFINIBAND is not set # CONFIG_EDAC is not set CONFIG_RTC_LIB=m CONFIG_RTC_CLASS=m # # RTC interfaces # CONFIG_RTC_INTF_SYSFS=y CONFIG_RTC_INTF_PROC=y CONFIG_RTC_INTF_DEV=y # CONFIG_RTC_INTF_DEV_UIE_EMUL is not set # CONFIG_RTC_DRV_TEST is not set # # I2C RTC drivers # # CONFIG_RTC_DRV_DS1307 is not set # CONFIG_RTC_DRV_DS1374 is not set # CONFIG_RTC_DRV_DS1672 is not set # CONFIG_RTC_DRV_DS1685 is not set # CONFIG_RTC_DRV_MAX6900 is not set # CONFIG_RTC_DRV_RS5C372 is not set # CONFIG_RTC_DRV_ISL1208 is not set # CONFIG_RTC_DRV_X1205 is not set # CONFIG_RTC_DRV_PCF8563 is not set # CONFIG_RTC_DRV_PCF8583 is not set # CONFIG_RTC_DRV_M41T80 is not set # CONFIG_RTC_DRV_S35390A is not set # CONFIG_RTC_DRV_FM3130 is not set # CONFIG_RTC_DRV_RX8581 is not set # CONFIG_RTC_DRV_RX8025 is not set # # SPI RTC drivers # # # Platform RTC drivers # CONFIG_RTC_DRV_CMOS=m # CONFIG_RTC_DRV_DS1286 is not set # CONFIG_RTC_DRV_DS1511 is not set # CONFIG_RTC_DRV_DS1553 is not set # CONFIG_RTC_DRV_DS1742 is not set # CONFIG_RTC_DRV_STK17TA8 is not set # CONFIG_RTC_DRV_M48T86 is not set # CONFIG_RTC_DRV_M48T35 is not set # CONFIG_RTC_DRV_M48T59 is not set # CONFIG_RTC_DRV_MSM6242 is not set # CONFIG_RTC_DRV_BQ4802 is not set # CONFIG_RTC_DRV_RP5C01 is not set # CONFIG_RTC_DRV_V3020 is not set # # on-CPU RTC drivers # # CONFIG_DMADEVICES is not set # CONFIG_AUXDISPLAY is not set CONFIG_UIO=m # CONFIG_UIO_CIF is not set # CONFIG_UIO_PDRV is not set # CONFIG_UIO_PDRV_GENIRQ is not set # CONFIG_UIO_SMX is not set # CONFIG_UIO_AEC is not set # CONFIG_UIO_SERCOS3 is not set # CONFIG_STAGING is not set CONFIG_X86_PLATFORM_DEVICES=y # CONFIG_ASUS_LAPTOP is not set # CONFIG_THINKPAD_ACPI is not set CONFIG_INTEL_MENLOW=m # CONFIG_EEEPC_LAPTOP is not set # CONFIG_ACPI_WMI is not set # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_TOSHIBA is not set # # Firmware Drivers # # CONFIG_EDD is not set CONFIG_FIRMWARE_MEMMAP=y # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set CONFIG_DMIID=y # CONFIG_ISCSI_IBFT_FIND is not set # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y # CONFIG_EXT2_FS_SECURITY is not set # CONFIG_EXT2_FS_XIP is not set CONFIG_EXT3_FS=y # CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y # CONFIG_EXT3_FS_SECURITY is not set # CONFIG_EXT4_FS is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set # CONFIG_REISERFS_PROC_INFO is not set CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y # CONFIG_REISERFS_FS_SECURITY is not set # CONFIG_REISER4_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y CONFIG_FILE_LOCKING=y # CONFIG_XFS_FS is not set # CONFIG_GFS2_FS is not set # CONFIG_OCFS2_FS is not set # CONFIG_BTRFS_FS is not set CONFIG_FSNOTIFY=y CONFIG_DNOTIFY=y CONFIG_INOTIFY=y CONFIG_INOTIFY_USER=y # CONFIG_QUOTA is not set # CONFIG_AUTOFS_FS is not set # CONFIG_AUTOFS4_FS is not set # CONFIG_FUSE_FS is not set CONFIG_GENERIC_ACL=y # # Caches # # CONFIG_FSCACHE is not set # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y # CONFIG_ZISOFS is not set # CONFIG_UDF_FS is not set # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_VMCORE=y CONFIG_PROC_SYSCTL=y CONFIG_PROC_PAGE_MONITOR=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y # CONFIG_CONFIGFS_FS is not set CONFIG_MISC_FILESYSTEMS=y # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_SQUASHFS is not set # CONFIG_VXFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_OMFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # CONFIG_NILFS2_FS is not set CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=y CONFIG_NFS_V3=y # CONFIG_NFS_V3_ACL is not set # CONFIG_NFS_V4 is not set CONFIG_ROOT_NFS=y CONFIG_NFSD=y CONFIG_NFSD_V3=y # CONFIG_NFSD_V3_ACL is not set # CONFIG_NFSD_V4 is not set CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y # CONFIG_RPCSEC_GSS_KRB5 is not set # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set # CONFIG_CIFS is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # # Partition Types # # CONFIG_PARTITION_ADVANCED is not set CONFIG_MSDOS_PARTITION=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set CONFIG_NLS_ISO8859_15=y # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set CONFIG_NLS_UTF8=y # CONFIG_DLM is not set # # Kernel hacking # CONFIG_TRACE_IRQFLAGS_SUPPORT=y # CONFIG_PRINTK_TIME is not set CONFIG_ENABLE_WARN_DEPRECATED=y # CONFIG_ENABLE_MUST_CHECK is not set CONFIG_FRAME_WARN=2048 CONFIG_MAGIC_SYSRQ=y CONFIG_UNUSED_SYMBOLS=y CONFIG_DEBUG_FS=y # CONFIG_HEADERS_CHECK is not set CONFIG_DEBUG_KERNEL=y # CONFIG_DEBUG_SHIRQ is not set CONFIG_DETECT_SOFTLOCKUP=y # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0 CONFIG_DETECT_HUNG_TASK=y # CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0 CONFIG_SCHED_DEBUG=y CONFIG_SCHEDSTATS=y CONFIG_TIMER_STATS=y # CONFIG_DEBUG_OBJECTS is not set CONFIG_SLQB_DEBUG=y # CONFIG_SLQB_DEBUG_ON is not set # CONFIG_SLQB_SYSFS is not set # CONFIG_DEBUG_RT_MUTEXES is not set # CONFIG_RT_MUTEX_TESTER is not set CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_LOCK_STAT=y CONFIG_DEBUG_LOCKDEP=y CONFIG_TRACE_IRQFLAGS=y CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set CONFIG_STACKTRACE=y # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_VM=y # CONFIG_DEBUG_VIRTUAL is not set # CONFIG_DEBUG_WRITECOUNT is not set CONFIG_DEBUG_MEMORY_INIT=y # CONFIG_DEBUG_LIST is not set # CONFIG_DEBUG_SG is not set # CONFIG_DEBUG_NOTIFIERS is not set CONFIG_ARCH_WANT_FRAME_POINTERS=y CONFIG_FRAME_POINTER=y # CONFIG_DEBUG_SYNCHRO_TEST is not set # CONFIG_BOOT_PRINTK_DELAY is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_RCU_CPU_STALL_DETECTOR is not set # CONFIG_KPROBES_SANITY_TEST is not set # CONFIG_BACKTRACE_SELF_TEST is not set # CONFIG_DEBUG_BLOCK_EXT_DEVT is not set # CONFIG_LKDTM is not set # CONFIG_FAULT_INJECTION is not set # CONFIG_LATENCYTOP is not set CONFIG_SYSCTL_SYSCALL_CHECK=y # CONFIG_DEBUG_PAGEALLOC is not set CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_NOP_TRACER=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_HW_BRANCH_TRACER=y CONFIG_HAVE_FTRACE_SYSCALLS=y CONFIG_RING_BUFFER=y CONFIG_EVENT_TRACING=y CONFIG_TRACING=y CONFIG_TRACING_SUPPORT=y # CONFIG_FTRACE is not set # CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set # CONFIG_DYNAMIC_DEBUG is not set # CONFIG_DMA_API_DEBUG is not set # CONFIG_SAMPLES is not set CONFIG_HAVE_ARCH_KGDB=y # CONFIG_KGDB is not set # CONFIG_STRICT_DEVMEM is not set CONFIG_X86_VERBOSE_BOOTUP=y CONFIG_EARLY_PRINTK=y # CONFIG_EARLY_PRINTK_DBGP is not set CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_PER_CPU_MAPS is not set # CONFIG_X86_PTDUMP is not set # CONFIG_DEBUG_RODATA is not set # CONFIG_DEBUG_NX_TEST is not set # CONFIG_IOMMU_DEBUG is not set CONFIG_X86_DS_SELFTEST=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y CONFIG_IO_DELAY_TYPE_0X80=0 CONFIG_IO_DELAY_TYPE_0XED=1 CONFIG_IO_DELAY_TYPE_UDELAY=2 CONFIG_IO_DELAY_TYPE_NONE=3 CONFIG_IO_DELAY_0X80=y # CONFIG_IO_DELAY_0XED is not set # CONFIG_IO_DELAY_UDELAY is not set # CONFIG_IO_DELAY_NONE is not set CONFIG_DEFAULT_IO_DELAY_TYPE=0 # CONFIG_DEBUG_BOOT_PARAMS is not set # CONFIG_CPA_DEBUG is not set # CONFIG_OPTIMIZE_INLINING is not set # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set CONFIG_SECURITYFS=y # CONFIG_SECURITY_FILE_CAPABILITIES is not set # CONFIG_IMA is not set CONFIG_XOR_BLOCKS=m CONFIG_ASYNC_CORE=m CONFIG_ASYNC_MEMCPY=m CONFIG_ASYNC_XOR=m CONFIG_CRYPTO=y # # Crypto core or helper # # CONFIG_CRYPTO_FIPS is not set CONFIG_CRYPTO_ALGAPI=y CONFIG_CRYPTO_ALGAPI2=y CONFIG_CRYPTO_AEAD2=y CONFIG_CRYPTO_BLKCIPHER2=y CONFIG_CRYPTO_HASH=y CONFIG_CRYPTO_HASH2=y CONFIG_CRYPTO_RNG2=y CONFIG_CRYPTO_PCOMP=y CONFIG_CRYPTO_MANAGER=y CONFIG_CRYPTO_MANAGER2=y # CONFIG_CRYPTO_GF128MUL is not set # CONFIG_CRYPTO_NULL is not set CONFIG_CRYPTO_WORKQUEUE=y # CONFIG_CRYPTO_CRYPTD is not set # CONFIG_CRYPTO_AUTHENC is not set # CONFIG_CRYPTO_TEST is not set # # Authenticated Encryption with Associated Data # # CONFIG_CRYPTO_CCM is not set # CONFIG_CRYPTO_GCM is not set # CONFIG_CRYPTO_SEQIV is not set # # Block modes # # CONFIG_CRYPTO_CBC is not set # CONFIG_CRYPTO_CTR is not set # CONFIG_CRYPTO_CTS is not set # CONFIG_CRYPTO_ECB is not set # CONFIG_CRYPTO_LRW is not set # CONFIG_CRYPTO_PCBC is not set # CONFIG_CRYPTO_XTS is not set # # Hash modes # CONFIG_CRYPTO_HMAC=y # CONFIG_CRYPTO_XCBC is not set # # Digest # # CONFIG_CRYPTO_CRC32C is not set # CONFIG_CRYPTO_CRC32C_INTEL is not set # CONFIG_CRYPTO_MD4 is not set CONFIG_CRYPTO_MD5=y # CONFIG_CRYPTO_MICHAEL_MIC is not set # CONFIG_CRYPTO_RMD128 is not set # CONFIG_CRYPTO_RMD160 is not set # CONFIG_CRYPTO_RMD256 is not set # CONFIG_CRYPTO_RMD320 is not set CONFIG_CRYPTO_SHA1=y # CONFIG_CRYPTO_SHA256 is not set # CONFIG_CRYPTO_SHA512 is not set # CONFIG_CRYPTO_TGR192 is not set # CONFIG_CRYPTO_WP512 is not set # # Ciphers # # CONFIG_CRYPTO_AES is not set # CONFIG_CRYPTO_AES_X86_64 is not set # CONFIG_CRYPTO_AES_NI_INTEL is not set # CONFIG_CRYPTO_ANUBIS is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_BLOWFISH is not set # CONFIG_CRYPTO_CAMELLIA is not set # CONFIG_CRYPTO_CAST5 is not set # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_DES is not set # CONFIG_CRYPTO_FCRYPT is not set # CONFIG_CRYPTO_KHAZAD is not set # CONFIG_CRYPTO_SALSA20 is not set # CONFIG_CRYPTO_SALSA20_X86_64 is not set # CONFIG_CRYPTO_SEED is not set # CONFIG_CRYPTO_SERPENT is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_TWOFISH is not set # CONFIG_CRYPTO_TWOFISH_X86_64 is not set # # Compression # # CONFIG_CRYPTO_DEFLATE is not set # CONFIG_CRYPTO_ZLIB is not set # CONFIG_CRYPTO_LZO is not set # # Random Number Generation # # CONFIG_CRYPTO_ANSI_CPRNG is not set CONFIG_CRYPTO_HW=y # CONFIG_CRYPTO_DEV_PADLOCK is not set # CONFIG_CRYPTO_DEV_HIFN_795X is not set CONFIG_HAVE_KVM=y CONFIG_HAVE_KVM_IRQCHIP=y CONFIG_VIRTUALIZATION=y CONFIG_KVM=m CONFIG_KVM_INTEL=m CONFIG_KVM_AMD=m # CONFIG_KVM_TRACE is not set CONFIG_VIRTIO=m CONFIG_VIRTIO_RING=m CONFIG_VIRTIO_PCI=m CONFIG_VIRTIO_BALLOON=m CONFIG_BINARY_PRINTF=y # # Library routines # CONFIG_BITREVERSE=y CONFIG_GENERIC_FIND_FIRST_BIT=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_FIND_LAST_BIT=y # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set CONFIG_CRC_T10DIF=y # CONFIG_CRC_ITU_T is not set CONFIG_CRC32=y # CONFIG_CRC7 is not set # CONFIG_LIBCRC32C is not set CONFIG_ZLIB_INFLATE=y CONFIG_DECOMPRESS_GZIP=y CONFIG_DECOMPRESS_BZIP2=y CONFIG_DECOMPRESS_LZMA=y CONFIG_HAS_IOMEM=y CONFIG_HAS_IOPORT=y CONFIG_HAS_DMA=y CONFIG_NLATTR=y ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-18 10:11 ` KAMEZAWA Hiroyuki 2009-05-18 10:45 ` Balbir Singh @ 2009-05-31 23:51 ` Balbir Singh 2009-06-01 23:57 ` KAMEZAWA Hiroyuki 1 sibling, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-05-31 23:51 UTC (permalink / raw) To: Andrew Morton, KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]: > On Fri, 15 May 2009 23:46:39 +0530 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > > > > > Balbir Singh wrote: > > > > Feature: Remove the overhead associated with the root cgroup > > > > > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > > > > > This patch changes the memory cgroup and removes the overhead associated > > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we > > > > can > > > > no longer set a memory hard limit in the root cgroup. > > > > > > > > A new flag is used to track page_cgroup associated with the root cgroup > > > > pages. A new flag to track whether the page has been accounted or not > > > > has been added as well. > > > > > > > > Review comments higly appreciated > > > > > > > > Tests > > > > > > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup > > > > 2. For the root cgroup tested performance impact with reaim > > > > > > > > > > > > +patch mmtom-08-may-2009 > > > > AIM9 1362.93 1338.17 > > > > Dbase 17457.75 16021.58 > > > > New Dbase 18070.18 16518.54 > > > > Shared 9681.85 8882.11 > > > > Compute 16197.79 15226.13 > > > > > > > Hmm, at first impression, I can't convice the numbers... > > > Just avoiding list_add/del makes programs _10%_ faster ? > > > Could you show changes in cpu cache-miss late if you can ? > > > (And why Aim9 goes bad ?) > > > > OK... I'll try but I am away on travel for 3 weeks :( you can try and run > > this as well > > > tested aim7 with some config. > > CPU: Xeon 3.1GHz/4Core x2 (8cpu) > Memory: 32G > HDD: Usual? Scsi disk (just 1 disk) > (try_to_free_pages() etc...will never be called.) > > Multiuser config. #of tasks 1100 (near to peak on my host) > > 10runs. > rc6mm1 score(Jobs/min) > 44009.1 44844.5 44691.1 43981.9 44992.6 > 44544.9 44179.1 44283.0 44442.9 45033.8 average=44500 > > +patch > 44656.8 44270.8 44706.7 44106.1 44467.6 > 44585.3 44167.0 44756.7 44853.9 44249.4 average=44482 > > Dbase config. #of tasks 25 > rc6mm1 score (jobs/min) > 11022.7 11018.9 11037.9 11003.8 11087.5 > 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071 > > +patch > 10888.0 10973.7 10913.9 11000.0 10984.9 > 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962 > > Hmm, 1% improvement ? > (I think this is reasonable score of the effect of this patch) > > Anyway, I'm afraid of difference between mine and your kernel config. > plz enjoy your travel for now :) > Hi, Andrew, Could you please pick up these patches for testing. Kamezawa-San, I am assuming that you are OK with these patches going to -mm for testing? Would you like me to resend the patches? Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-31 23:51 ` Balbir Singh @ 2009-06-01 23:57 ` KAMEZAWA Hiroyuki 2009-06-05 5:31 ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-06-01 23:57 UTC (permalink / raw) To: balbir Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro On Mon, 1 Jun 2009 07:51:21 +0800 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]: > > > On Fri, 15 May 2009 23:46:39 +0530 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > > > > > > > Balbir Singh wrote: > > > > > Feature: Remove the overhead associated with the root cgroup > > > > > > > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > > > > > > > This patch changes the memory cgroup and removes the overhead associated > > > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we > > > > > can > > > > > no longer set a memory hard limit in the root cgroup. > > > > > > > > > > A new flag is used to track page_cgroup associated with the root cgroup > > > > > pages. A new flag to track whether the page has been accounted or not > > > > > has been added as well. > > > > > > > > > > Review comments higly appreciated > > > > > > > > > > Tests > > > > > > > > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup > > > > > 2. For the root cgroup tested performance impact with reaim > > > > > > > > > > > > > > > +patch mmtom-08-may-2009 > > > > > AIM9 1362.93 1338.17 > > > > > Dbase 17457.75 16021.58 > > > > > New Dbase 18070.18 16518.54 > > > > > Shared 9681.85 8882.11 > > > > > Compute 16197.79 15226.13 > > > > > > > > > Hmm, at first impression, I can't convice the numbers... > > > > Just avoiding list_add/del makes programs _10%_ faster ? > > > > Could you show changes in cpu cache-miss late if you can ? > > > > (And why Aim9 goes bad ?) > > > > > > OK... I'll try but I am away on travel for 3 weeks :( you can try and run > > > this as well > > > > > tested aim7 with some config. > > > > CPU: Xeon 3.1GHz/4Core x2 (8cpu) > > Memory: 32G > > HDD: Usual? Scsi disk (just 1 disk) > > (try_to_free_pages() etc...will never be called.) > > > > Multiuser config. #of tasks 1100 (near to peak on my host) > > > > 10runs. > > rc6mm1 score(Jobs/min) > > 44009.1 44844.5 44691.1 43981.9 44992.6 > > 44544.9 44179.1 44283.0 44442.9 45033.8 average=44500 > > > > +patch > > 44656.8 44270.8 44706.7 44106.1 44467.6 > > 44585.3 44167.0 44756.7 44853.9 44249.4 average=44482 > > > > Dbase config. #of tasks 25 > > rc6mm1 score (jobs/min) > > 11022.7 11018.9 11037.9 11003.8 11087.5 > > 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071 > > > > +patch > > 10888.0 10973.7 10913.9 11000.0 10984.9 > > 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962 > > > > Hmm, 1% improvement ? > > (I think this is reasonable score of the effect of this patch) > > > > Anyway, I'm afraid of difference between mine and your kernel config. > > plz enjoy your travel for now :) > > > > > Hi, Andrew, > > Could you please pick up these patches for testing. Kamezawa-San, I am > assuming that you are OK with these patches going to -mm for testing? > o.k. but.. > Would you like me to resend the patches? > It's 2 weeks since original post. and several bug fixes are merged. Could you post again ? (And it seems Nishimura-san posted some comments.) Of course, I'll test again. Thanks, -Kame > Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Low overhead patches for the memory cgroup controller (v3) 2009-06-01 23:57 ` KAMEZAWA Hiroyuki @ 2009-06-05 5:31 ` Balbir Singh 2009-06-05 5:51 ` KAMEZAWA Hiroyuki ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Balbir Singh @ 2009-06-05 5:31 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro Here is the new version of the patch with the RFC dropped. Andrew, Kame, could you please take a look. I am just about to fly out to get back home tomorrow, so there might be some silence, unless I get to the next WiFi enabled airport. From: Balbir Singh <balbir@linux.vnet.ibm.com> Changelog v3 -> v2 1. Rebase to mmotm 2nd June 2009 2. Test with some of the test cases recommended by Daisuke-San Changelog v2 -> v1 1. Fix and implement review comments. Feature: Remove the overhead associated with the root cgroup This patch changes the memory cgroup and removes the overhead associated with accounting all pages in the root cgroup. As a side-effect, we can no longer set a memory hard limit in the root cgroup. A new flag is used to track page_cgroup associated with the root cgroup pages. A new flag to track whether the page has been accounted or not has been added as well. Flags are now set atomically for page_cgroup, pcg_default_flags is now obsolete, but I've not removed it yet. It provides some readability to help the code. Tests Results: Obtained by 1. Using tmpfs for mounting filesystem 2. Changing sync to be /bin/true (so that sync is not the bottleneck) 3. Used -s #cpus*40 -e #cpus*40 Reaim withoutpatch patch AIM9 9532.48 9807.59 dbase 19344.60 19285.71 new_dbase 20101.65 20163.13 shared 11827.77 11886.65 compute 17317.38 17420.05 Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- include/linux/page_cgroup.h | 12 ++++++++++++ mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- mm/page_cgroup.c | 1 - 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 7339c7b..41cc16c 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -26,6 +26,8 @@ enum { PCG_LOCK, /* page cgroup is locked */ PCG_CACHE, /* charged as cache */ PCG_USED, /* this object is in use. */ + PCG_ROOT, /* page belongs to root cgroup */ + PCG_ACCT_LRU, /* page has been accounted for */ }; #define TESTPCGFLAG(uname, lname) \ @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ /* Cache flag is set only once (at allocation) */ TESTPCGFLAG(Cache, CACHE) +SETPCGFLAG(Cache, CACHE) TESTPCGFLAG(Used, USED) CLEARPCGFLAG(Used, USED) +SETPCGFLAG(Used, USED) + +SETPCGFLAG(Root, ROOT) +CLEARPCGFLAG(Root, ROOT) +TESTPCGFLAG(Root, ROOT) + +SETPCGFLAG(AcctLru, ACCT_LRU) +CLEARPCGFLAG(AcctLru, ACCT_LRU) +TESTPCGFLAG(AcctLru, ACCT_LRU) static inline int page_cgroup_nid(struct page_cgroup *pc) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a83e039..9561d10 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -43,6 +43,7 @@ struct cgroup_subsys mem_cgroup_subsys __read_mostly; #define MEM_CGROUP_RECLAIM_RETRIES 5 +struct mem_cgroup *root_mem_cgroup __read_mostly; #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ @@ -197,6 +198,10 @@ enum charge_type { #define PCGF_CACHE (1UL << PCG_CACHE) #define PCGF_USED (1UL << PCG_USED) #define PCGF_LOCK (1UL << PCG_LOCK) +/* Not used, but added here for completeness */ +#define PCGF_ROOT (1UL << PCG_ROOT) +#define PCGF_ACCT (1UL << PCG_ACCT) + static const unsigned long pcg_default_flags[NR_CHARGE_TYPE] = { PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) return; pc = lookup_page_cgroup(page); /* can happen while we handle swapcache. */ - if (list_empty(&pc->lru) || !pc->mem_cgroup) + if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) return; /* * We don't check PCG_USED bit. It's cleared when the "page" is finally @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) mz = page_cgroup_zoneinfo(pc); mem = pc->mem_cgroup; MEM_CGROUP_ZSTAT(mz, lru) -= 1; + ClearPageCgroupAcctLru(pc); + if (PageCgroupRoot(pc)) + return; list_del_init(&pc->lru); return; } @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) * For making pc->mem_cgroup visible, insert smp_rmb() here. */ smp_rmb(); - /* unused page is not rotated. */ - if (!PageCgroupUsed(pc)) + /* unused or root page is not rotated. */ + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) return; mz = page_cgroup_zoneinfo(pc); list_move(&pc->lru, &mz->lists[lru]); @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) mz = page_cgroup_zoneinfo(pc); MEM_CGROUP_ZSTAT(mz, lru) += 1; + SetPageCgroupAcctLru(pc); + if (PageCgroupRoot(pc)) + return; list_add(&pc->lru, &mz->lists[lru]); } @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, css_put(&mem->css); return; } + pc->mem_cgroup = mem; smp_wmb(); - pc->flags = pcg_default_flags[ctype]; + switch (ctype) { + case MEM_CGROUP_CHARGE_TYPE_CACHE: + case MEM_CGROUP_CHARGE_TYPE_SHMEM: + SetPageCgroupCache(pc); + SetPageCgroupUsed(pc); + break; + case MEM_CGROUP_CHARGE_TYPE_MAPPED: + SetPageCgroupUsed(pc); + break; + default: + break; + } + + if (mem == root_mem_cgroup) + SetPageCgroupRoot(pc); mem_cgroup_charge_statistics(mem, pc, true); @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) mem_cgroup_charge_statistics(mem, pc, false); ClearPageCgroupUsed(pc); + if (mem == root_mem_cgroup) + ClearPageCgroupRoot(pc); /* * pc->mem_cgroup is not cleared here. It will be accessed when it's * freed from LRU. This is safe because uncharged page is expected not @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, name = MEMFILE_ATTR(cft->private); switch (name) { case RES_LIMIT: + if (memcg == root_mem_cgroup) { /* Can't set limit on root */ + ret = -EINVAL; + break; + } /* This function does all necessary parse...reuse it */ ret = res_counter_memparse_write_strategy(buffer, &val); if (ret) @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) if (cont->parent == NULL) { enable_swap_cgroup(); parent = NULL; + root_mem_cgroup = mem; } else { parent = mem_cgroup_from_cont(cont->parent); mem->use_hierarchy = parent->use_hierarchy; @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) return &mem->css; free_out: __mem_cgroup_free(mem); + root_mem_cgroup = NULL; return ERR_PTR(error); } diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c index ecc3918..4406a9c 100644 --- a/mm/page_cgroup.c +++ b/mm/page_cgroup.c @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) #endif - #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP static DEFINE_MUTEX(swap_cgroup_mutex); -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v3) 2009-06-05 5:31 ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh @ 2009-06-05 5:51 ` KAMEZAWA Hiroyuki 2009-06-05 9:33 ` Balbir Singh 2009-06-05 6:05 ` Daisuke Nishimura ` (2 subsequent siblings) 3 siblings, 1 reply; 30+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-06-05 5:51 UTC (permalink / raw) To: balbir Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro On Fri, 5 Jun 2009 13:31:07 +0800 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Here is the new version of the patch with the RFC dropped. Andrew, > Kame, could you please take a look. I am just about to fly out to get > back home tomorrow, so there might be some silence, unless I get to > the next WiFi enabled airport. > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > Changelog v3 -> v2 > > 1. Rebase to mmotm 2nd June 2009 > 2. Test with some of the test cases recommended by Daisuke-San > > Changelog v2 -> v1 > 1. Fix and implement review comments. > > Feature: Remove the overhead associated with the root cgroup > > This patch changes the memory cgroup and removes the overhead associated > with accounting all pages in the root cgroup. As a side-effect, we can > no longer set a memory hard limit in the root cgroup. > > A new flag is used to track page_cgroup associated with the root cgroup > pages. A new flag to track whether the page has been accounted or not > has been added as well. Flags are now set atomically for page_cgroup, > pcg_default_flags is now obsolete, but I've not removed it yet. It > provides some readability to help the code. > > Tests Results: > > Obtained by > > 1. Using tmpfs for mounting filesystem > 2. Changing sync to be /bin/true (so that sync is not the bottleneck) > 3. Used -s #cpus*40 -e #cpus*40 > > Reaim > withoutpatch patch > AIM9 9532.48 9807.59 > dbase 19344.60 19285.71 > new_dbase 20101.65 20163.13 > shared 11827.77 11886.65 > compute 17317.38 17420.05 > A few comments. > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > > include/linux/page_cgroup.h | 12 ++++++++++++ > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- > mm/page_cgroup.c | 1 - > 3 files changed, 50 insertions(+), 5 deletions(-) > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > index 7339c7b..41cc16c 100644 > --- a/include/linux/page_cgroup.h > +++ b/include/linux/page_cgroup.h > @@ -26,6 +26,8 @@ enum { > PCG_LOCK, /* page cgroup is locked */ > PCG_CACHE, /* charged as cache */ > PCG_USED, /* this object is in use. */ > + PCG_ROOT, /* page belongs to root cgroup */ > + PCG_ACCT_LRU, /* page has been accounted for */ > }; > > #define TESTPCGFLAG(uname, lname) \ > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ > > /* Cache flag is set only once (at allocation) */ > TESTPCGFLAG(Cache, CACHE) > +SETPCGFLAG(Cache, CACHE) > > TESTPCGFLAG(Used, USED) > CLEARPCGFLAG(Used, USED) > +SETPCGFLAG(Used, USED) > + > +SETPCGFLAG(Root, ROOT) > +CLEARPCGFLAG(Root, ROOT) > +TESTPCGFLAG(Root, ROOT) > + > +SETPCGFLAG(AcctLru, ACCT_LRU) > +CLEARPCGFLAG(AcctLru, ACCT_LRU) > +TESTPCGFLAG(AcctLru, ACCT_LRU) > I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through the kernel. > static inline int page_cgroup_nid(struct page_cgroup *pc) > { > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a83e039..9561d10 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -43,6 +43,7 @@ > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > #define MEM_CGROUP_RECLAIM_RETRIES 5 > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > @@ -197,6 +198,10 @@ enum charge_type { > #define PCGF_CACHE (1UL << PCG_CACHE) > #define PCGF_USED (1UL << PCG_USED) > #define PCGF_LOCK (1UL << PCG_LOCK) > +/* Not used, but added here for completeness */ > +#define PCGF_ROOT (1UL << PCG_ROOT) > +#define PCGF_ACCT (1UL << PCG_ACCT) > + > static const unsigned long > pcg_default_flags[NR_CHARGE_TYPE] = { > PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ Could you delete this default_flags ? This is of no use after this patch. > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > return; > pc = lookup_page_cgroup(page); > /* can happen while we handle swapcache. */ > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > + if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) > return; I wonder this condition is valid one or not.. IMHO, all check here should be == if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup) return; mz = page_cgroup_zoneinfo(pc); mem = pc->mem_cgroup; MEM_CGROUP_ZSTAT(mz, lru) -= 1; ClearPageCgroupAcctLru(pc); if (PageCgroupRoot(pc)) return; VM_BUGON(list_empty(&pc->lru); list_del_init(&pc->lru); return; == I'm sorry if there is a case (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru)) > /* > * We don't check PCG_USED bit. It's cleared when the "page" is finally > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > mz = page_cgroup_zoneinfo(pc); > mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > + ClearPageCgroupAcctLru(pc); > + if (PageCgroupRoot(pc)) > + return; > list_del_init(&pc->lru); > return; > } > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > * For making pc->mem_cgroup visible, insert smp_rmb() here. > */ > smp_rmb(); > - /* unused page is not rotated. */ > - if (!PageCgroupUsed(pc)) > + /* unused or root page is not rotated. */ > + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) > return; > mz = page_cgroup_zoneinfo(pc); > list_move(&pc->lru, &mz->lists[lru]); > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > MEM_CGROUP_ZSTAT(mz, lru) += 1; > + SetPageCgroupAcctLru(pc); > + if (PageCgroupRoot(pc)) > + return; > list_add(&pc->lru, &mz->lists[lru]); > } > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > css_put(&mem->css); > return; > } > + > pc->mem_cgroup = mem; > smp_wmb(); > - pc->flags = pcg_default_flags[ctype]; > + switch (ctype) { > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > + SetPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > + SetPageCgroupUsed(pc); > + break; > + default: > + break; > + } > + > + if (mem == root_mem_cgroup) > + SetPageCgroupRoot(pc); > > mem_cgroup_charge_statistics(mem, pc, true); > My concern here is there will be a racy moment that pc->flag shows PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup. Then, The order of code here should be == if (mem == root_mem_cgroup) SetPageCgroupRoot(pc); pc->mem_cgroup == mem;; smp_wmb(); switch(type) { case.... } // Used bit is set at last. == But I wonder it's better to use == static inline int page_cgroup_is_under_root(pc) { pc->mem_cgroup == root_mem_cgroup; } == I'm not sure why PageCgroupRoot() "bit" is necessary. Could you clarify the benefit of Root flag ? > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > mem_cgroup_charge_statistics(mem, pc, false); > > ClearPageCgroupUsed(pc); > + if (mem == root_mem_cgroup) > + ClearPageCgroupRoot(pc); > /* > * pc->mem_cgroup is not cleared here. It will be accessed when it's > * freed from LRU. This is safe because uncharged page is expected not > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > name = MEMFILE_ATTR(cft->private); > switch (name) { > case RES_LIMIT: > + if (memcg == root_mem_cgroup) { /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } > /* This function does all necessary parse...reuse it */ > ret = res_counter_memparse_write_strategy(buffer, &val); > if (ret) > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > if (cont->parent == NULL) { > enable_swap_cgroup(); > parent = NULL; > + root_mem_cgroup = mem; > } else { > parent = mem_cgroup_from_cont(cont->parent); > mem->use_hierarchy = parent->use_hierarchy; > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > return &mem->css; > free_out: > __mem_cgroup_free(mem); > + root_mem_cgroup = NULL; > return ERR_PTR(error); > } > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c > index ecc3918..4406a9c 100644 > --- a/mm/page_cgroup.c > +++ b/mm/page_cgroup.c > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) > > #endif > > - > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > static DEFINE_MUTEX(swap_cgroup_mutex); > Unnecessary diff here. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v3) 2009-06-05 5:51 ` KAMEZAWA Hiroyuki @ 2009-06-05 9:33 ` Balbir Singh 2009-06-08 0:20 ` Daisuke Nishimura 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-06-05 9:33 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-05 14:51:41]: > On Fri, 5 Jun 2009 13:31:07 +0800 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > Here is the new version of the patch with the RFC dropped. Andrew, > > Kame, could you please take a look. I am just about to fly out to get > > back home tomorrow, so there might be some silence, unless I get to > > the next WiFi enabled airport. > > > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > Changelog v3 -> v2 > > > > 1. Rebase to mmotm 2nd June 2009 > > 2. Test with some of the test cases recommended by Daisuke-San > > > > Changelog v2 -> v1 > > 1. Fix and implement review comments. > > > > Feature: Remove the overhead associated with the root cgroup > > > > This patch changes the memory cgroup and removes the overhead associated > > with accounting all pages in the root cgroup. As a side-effect, we can > > no longer set a memory hard limit in the root cgroup. > > > > A new flag is used to track page_cgroup associated with the root cgroup > > pages. A new flag to track whether the page has been accounted or not > > has been added as well. Flags are now set atomically for page_cgroup, > > pcg_default_flags is now obsolete, but I've not removed it yet. It > > provides some readability to help the code. > > > > Tests Results: > > > > Obtained by > > > > 1. Using tmpfs for mounting filesystem > > 2. Changing sync to be /bin/true (so that sync is not the bottleneck) > > 3. Used -s #cpus*40 -e #cpus*40 > > > > Reaim > > withoutpatch patch > > AIM9 9532.48 9807.59 > > dbase 19344.60 19285.71 > > new_dbase 20101.65 20163.13 > > shared 11827.77 11886.65 > > compute 17317.38 17420.05 > > > > A few comments. > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > --- > > > > include/linux/page_cgroup.h | 12 ++++++++++++ > > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- > > mm/page_cgroup.c | 1 - > > 3 files changed, 50 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > > index 7339c7b..41cc16c 100644 > > --- a/include/linux/page_cgroup.h > > +++ b/include/linux/page_cgroup.h > > @@ -26,6 +26,8 @@ enum { > > PCG_LOCK, /* page cgroup is locked */ > > PCG_CACHE, /* charged as cache */ > > PCG_USED, /* this object is in use. */ > > + PCG_ROOT, /* page belongs to root cgroup */ > > + PCG_ACCT_LRU, /* page has been accounted for */ > > }; > > > > #define TESTPCGFLAG(uname, lname) \ > > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ > > > > /* Cache flag is set only once (at allocation) */ > > TESTPCGFLAG(Cache, CACHE) > > +SETPCGFLAG(Cache, CACHE) > > > > TESTPCGFLAG(Used, USED) > > CLEARPCGFLAG(Used, USED) > > +SETPCGFLAG(Used, USED) > > + > > +SETPCGFLAG(Root, ROOT) > > +CLEARPCGFLAG(Root, ROOT) > > +TESTPCGFLAG(Root, ROOT) > > + > > +SETPCGFLAG(AcctLru, ACCT_LRU) > > +CLEARPCGFLAG(AcctLru, ACCT_LRU) > > +TESTPCGFLAG(AcctLru, ACCT_LRU) > > > I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through > the kernel. OK, I'll make that change. I agree LRU is better. > > > static inline int page_cgroup_nid(struct page_cgroup *pc) > > { > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a83e039..9561d10 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -43,6 +43,7 @@ > > > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > > #define MEM_CGROUP_RECLAIM_RETRIES 5 > > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > > @@ -197,6 +198,10 @@ enum charge_type { > > #define PCGF_CACHE (1UL << PCG_CACHE) > > #define PCGF_USED (1UL << PCG_USED) > > #define PCGF_LOCK (1UL << PCG_LOCK) > > +/* Not used, but added here for completeness */ > > +#define PCGF_ROOT (1UL << PCG_ROOT) > > +#define PCGF_ACCT (1UL << PCG_ACCT) > > + > > static const unsigned long > > pcg_default_flags[NR_CHARGE_TYPE] = { > > PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ > > Could you delete this default_flags ? This is of no use after this patch. > Yes, I mentioned in the comment that they are for readability of the code. I can remove them if required. > > > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > > return; > > pc = lookup_page_cgroup(page); > > /* can happen while we handle swapcache. */ > > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > > + if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) > > return; > I wonder this condition is valid one or not.. > > IMHO, all check here should be > > == > if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup) > return; > mz = page_cgroup_zoneinfo(pc); > mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > ClearPageCgroupAcctLru(pc); > if (PageCgroupRoot(pc)) > return; > VM_BUGON(list_empty(&pc->lru); > list_del_init(&pc->lru); > return; We needed this check because 1. After PageCgroupRoot(), list_empty() will always return true for root cgroup 2. For non root, it won't The check is enhanced to say, don't go by list_empty(), look to see if this is root. I think we can change the condition and stop relying on list_empty() for the check. I agree. > == > > I'm sorry if there is a case > (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru)) > Should not be, I think the list_empty() was used to indicated already unaccounted, so explicit flags should work fine. > > > /* > > * We don't check PCG_USED bit. It's cleared when the "page" is finally > > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > > mem = pc->mem_cgroup; > > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > > + ClearPageCgroupAcctLru(pc); > > + if (PageCgroupRoot(pc)) > > + return; > > list_del_init(&pc->lru); > > return; > > } > > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > > * For making pc->mem_cgroup visible, insert smp_rmb() here. > > */ > > smp_rmb(); > > - /* unused page is not rotated. */ > > - if (!PageCgroupUsed(pc)) > > + /* unused or root page is not rotated. */ > > + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) > > return; > > mz = page_cgroup_zoneinfo(pc); > > list_move(&pc->lru, &mz->lists[lru]); > > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > > > mz = page_cgroup_zoneinfo(pc); > > MEM_CGROUP_ZSTAT(mz, lru) += 1; > > + SetPageCgroupAcctLru(pc); > > + if (PageCgroupRoot(pc)) > > + return; > > list_add(&pc->lru, &mz->lists[lru]); > > } > > > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > > css_put(&mem->css); > > return; > > } > > + > > pc->mem_cgroup = mem; > > smp_wmb(); > > - pc->flags = pcg_default_flags[ctype]; > > + switch (ctype) { > > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > > + SetPageCgroupCache(pc); > > + SetPageCgroupUsed(pc); > > + break; > > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > > + SetPageCgroupUsed(pc); > > + break; > > + default: > > + break; > > + } > > + > > + if (mem == root_mem_cgroup) > > + SetPageCgroupRoot(pc); > > > > mem_cgroup_charge_statistics(mem, pc, true); > > > My concern here is there will be a racy moment that pc->flag shows > PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup. > > Then, The order of code here should be > == > if (mem == root_mem_cgroup) > SetPageCgroupRoot(pc); > pc->mem_cgroup == mem;; > smp_wmb(); > switch(type) { > case.... > } > // Used bit is set at last. > == > > But I wonder it's better to use > == > static inline int page_cgroup_is_under_root(pc) > { > pc->mem_cgroup == root_mem_cgroup; > } > == > I'm not sure why PageCgroupRoot() "bit" is necessary. > Could you clarify the benefit of Root flag ? The Root flags was used for accounting, but I think we can start removing it now. > > > > > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > > mem_cgroup_charge_statistics(mem, pc, false); > > > > ClearPageCgroupUsed(pc); > > + if (mem == root_mem_cgroup) > > + ClearPageCgroupRoot(pc); > > /* > > * pc->mem_cgroup is not cleared here. It will be accessed when it's > > * freed from LRU. This is safe because uncharged page is expected not > > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > > name = MEMFILE_ATTR(cft->private); > > switch (name) { > > case RES_LIMIT: > > + if (memcg == root_mem_cgroup) { /* Can't set limit on root */ > > + ret = -EINVAL; > > + break; > > + } > > /* This function does all necessary parse...reuse it */ > > ret = res_counter_memparse_write_strategy(buffer, &val); > > if (ret) > > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > if (cont->parent == NULL) { > > enable_swap_cgroup(); > > parent = NULL; > > + root_mem_cgroup = mem; > > } else { > > parent = mem_cgroup_from_cont(cont->parent); > > mem->use_hierarchy = parent->use_hierarchy; > > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > return &mem->css; > > free_out: > > __mem_cgroup_free(mem); > > + root_mem_cgroup = NULL; > > return ERR_PTR(error); > > } > > > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c > > index ecc3918..4406a9c 100644 > > --- a/mm/page_cgroup.c > > +++ b/mm/page_cgroup.c > > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) > > > > #endif > > > > - > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > > > static DEFINE_MUTEX(swap_cgroup_mutex); > > > Unnecessary diff here. > Yes, I'll add back the space. Thanks for the review -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v3) 2009-06-05 9:33 ` Balbir Singh @ 2009-06-08 0:20 ` Daisuke Nishimura 0 siblings, 0 replies; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-08 0:20 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura On Fri, 5 Jun 2009 17:33:54 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-05 14:51:41]: > > > On Fri, 5 Jun 2009 13:31:07 +0800 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > Here is the new version of the patch with the RFC dropped. Andrew, > > > Kame, could you please take a look. I am just about to fly out to get > > > back home tomorrow, so there might be some silence, unless I get to > > > the next WiFi enabled airport. > > > > > > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > > > Changelog v3 -> v2 > > > > > > 1. Rebase to mmotm 2nd June 2009 > > > 2. Test with some of the test cases recommended by Daisuke-San > > > > > > Changelog v2 -> v1 > > > 1. Fix and implement review comments. > > > > > > Feature: Remove the overhead associated with the root cgroup > > > > > > This patch changes the memory cgroup and removes the overhead associated > > > with accounting all pages in the root cgroup. As a side-effect, we can > > > no longer set a memory hard limit in the root cgroup. > > > > > > A new flag is used to track page_cgroup associated with the root cgroup > > > pages. A new flag to track whether the page has been accounted or not > > > has been added as well. Flags are now set atomically for page_cgroup, > > > pcg_default_flags is now obsolete, but I've not removed it yet. It > > > provides some readability to help the code. > > > > > > Tests Results: > > > > > > Obtained by > > > > > > 1. Using tmpfs for mounting filesystem > > > 2. Changing sync to be /bin/true (so that sync is not the bottleneck) > > > 3. Used -s #cpus*40 -e #cpus*40 > > > > > > Reaim > > > withoutpatch patch > > > AIM9 9532.48 9807.59 > > > dbase 19344.60 19285.71 > > > new_dbase 20101.65 20163.13 > > > shared 11827.77 11886.65 > > > compute 17317.38 17420.05 > > > > > > > A few comments. > > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > --- > > > > > > include/linux/page_cgroup.h | 12 ++++++++++++ > > > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- > > > mm/page_cgroup.c | 1 - > > > 3 files changed, 50 insertions(+), 5 deletions(-) > > > > > > > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > > > index 7339c7b..41cc16c 100644 > > > --- a/include/linux/page_cgroup.h > > > +++ b/include/linux/page_cgroup.h > > > @@ -26,6 +26,8 @@ enum { > > > PCG_LOCK, /* page cgroup is locked */ > > > PCG_CACHE, /* charged as cache */ > > > PCG_USED, /* this object is in use. */ > > > + PCG_ROOT, /* page belongs to root cgroup */ > > > + PCG_ACCT_LRU, /* page has been accounted for */ > > > }; > > > > > > #define TESTPCGFLAG(uname, lname) \ > > > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ > > > > > > /* Cache flag is set only once (at allocation) */ > > > TESTPCGFLAG(Cache, CACHE) > > > +SETPCGFLAG(Cache, CACHE) > > > > > > TESTPCGFLAG(Used, USED) > > > CLEARPCGFLAG(Used, USED) > > > +SETPCGFLAG(Used, USED) > > > + > > > +SETPCGFLAG(Root, ROOT) > > > +CLEARPCGFLAG(Root, ROOT) > > > +TESTPCGFLAG(Root, ROOT) > > > + > > > +SETPCGFLAG(AcctLru, ACCT_LRU) > > > +CLEARPCGFLAG(AcctLru, ACCT_LRU) > > > +TESTPCGFLAG(AcctLru, ACCT_LRU) > > > > > I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through > > the kernel. > > OK, I'll make that change. I agree LRU is better. > > > > > > static inline int page_cgroup_nid(struct page_cgroup *pc) > > > { > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index a83e039..9561d10 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -43,6 +43,7 @@ > > > > > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > > > #define MEM_CGROUP_RECLAIM_RETRIES 5 > > > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > > > > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > > > @@ -197,6 +198,10 @@ enum charge_type { > > > #define PCGF_CACHE (1UL << PCG_CACHE) > > > #define PCGF_USED (1UL << PCG_USED) > > > #define PCGF_LOCK (1UL << PCG_LOCK) > > > +/* Not used, but added here for completeness */ > > > +#define PCGF_ROOT (1UL << PCG_ROOT) > > > +#define PCGF_ACCT (1UL << PCG_ACCT) > > > + > > > static const unsigned long > > > pcg_default_flags[NR_CHARGE_TYPE] = { > > > PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ > > > > Could you delete this default_flags ? This is of no use after this patch. > > > > Yes, I mentioned in the comment that they are for readability of the > code. I can remove them if required. > > > > > > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > > > return; > > > pc = lookup_page_cgroup(page); > > > /* can happen while we handle swapcache. */ > > > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > > > + if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) > > > return; > > I wonder this condition is valid one or not.. > > > > IMHO, all check here should be > > > > == > > if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup) I think checking !pc->mem_cgroup would also be verbose, it can be changed to VM_BUG_ON(). And wouldn't "if (!TestClearPageCgroupAcctLRU(pc))" be better ? We can remove ClearPageCgroupAcctLRU() then. > > return; > > mz = page_cgroup_zoneinfo(pc); > > mem = pc->mem_cgroup; > > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > > ClearPageCgroupAcctLru(pc); > > if (PageCgroupRoot(pc)) > > return; > > VM_BUGON(list_empty(&pc->lru); > > list_del_init(&pc->lru); > > return; > > We needed this check because > > 1. After PageCgroupRoot(), list_empty() will always return true for > root cgroup > 2. For non root, it won't > > The check is enhanced to say, don't go by list_empty(), look to see if > this is root. > > I think we can change the condition and stop relying on list_empty() > for the check. I agree. > > > > == > > > > I'm sorry if there is a case > > (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru)) > > > > Should not be, I think the list_empty() was used to indicated already > unaccounted, so explicit flags should work fine. > > > > > > /* > > > * We don't check PCG_USED bit. It's cleared when the "page" is finally > > > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > > > mz = page_cgroup_zoneinfo(pc); > > > mem = pc->mem_cgroup; Can you delte this obsolete line ? > > > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > > > + ClearPageCgroupAcctLru(pc); > > > + if (PageCgroupRoot(pc)) > > > + return; > > > list_del_init(&pc->lru); > > > return; > > > } > > > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > > > * For making pc->mem_cgroup visible, insert smp_rmb() here. > > > */ > > > smp_rmb(); > > > - /* unused page is not rotated. */ > > > - if (!PageCgroupUsed(pc)) > > > + /* unused or root page is not rotated. */ > > > + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) > > > return; > > > mz = page_cgroup_zoneinfo(pc); > > > list_move(&pc->lru, &mz->lists[lru]); > > > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > > > > > mz = page_cgroup_zoneinfo(pc); > > > MEM_CGROUP_ZSTAT(mz, lru) += 1; > > > + SetPageCgroupAcctLru(pc); > > > + if (PageCgroupRoot(pc)) > > > + return; > > > list_add(&pc->lru, &mz->lists[lru]); > > > } > > > Can you add "VM_BUG_ON(PageCgroupAcctLRU(pc))" in mem_cgroup_add_lru_list() ? And you should change "list_empty(&pc->lru)" in mem_cgroup_lru_add_after_commit_swapcache() to "!PageCgroupAcctLRU(pc)". Thanks, Daisuke Nishimura. > > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > > > css_put(&mem->css); > > > return; > > > } > > > + > > > pc->mem_cgroup = mem; > > > smp_wmb(); > > > - pc->flags = pcg_default_flags[ctype]; > > > + switch (ctype) { > > > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > > > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > > > + SetPageCgroupCache(pc); > > > + SetPageCgroupUsed(pc); > > > + break; > > > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > > > + SetPageCgroupUsed(pc); > > > + break; > > > + default: > > > + break; > > > + } > > > + > > > + if (mem == root_mem_cgroup) > > > + SetPageCgroupRoot(pc); > > > > > > mem_cgroup_charge_statistics(mem, pc, true); > > > > > My concern here is there will be a racy moment that pc->flag shows > > PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup. > > > > Then, The order of code here should be > > == > > if (mem == root_mem_cgroup) > > SetPageCgroupRoot(pc); > > pc->mem_cgroup == mem;; > > smp_wmb(); > > switch(type) { > > case.... > > } > > // Used bit is set at last. > > == > > > > But I wonder it's better to use > > == > > static inline int page_cgroup_is_under_root(pc) > > { > > pc->mem_cgroup == root_mem_cgroup; > > } > > == > > I'm not sure why PageCgroupRoot() "bit" is necessary. > > Could you clarify the benefit of Root flag ? > > The Root flags was used for accounting, but I think we can start > removing it now. > > > > > > > > > > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > > > mem_cgroup_charge_statistics(mem, pc, false); > > > > > > ClearPageCgroupUsed(pc); > > > + if (mem == root_mem_cgroup) > > > + ClearPageCgroupRoot(pc); > > > /* > > > * pc->mem_cgroup is not cleared here. It will be accessed when it's > > > * freed from LRU. This is safe because uncharged page is expected not > > > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > > > name = MEMFILE_ATTR(cft->private); > > > switch (name) { > > > case RES_LIMIT: > > > + if (memcg == root_mem_cgroup) { /* Can't set limit on root */ > > > + ret = -EINVAL; > > > + break; > > > + } > > > /* This function does all necessary parse...reuse it */ > > > ret = res_counter_memparse_write_strategy(buffer, &val); > > > if (ret) > > > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > > if (cont->parent == NULL) { > > > enable_swap_cgroup(); > > > parent = NULL; > > > + root_mem_cgroup = mem; > > > } else { > > > parent = mem_cgroup_from_cont(cont->parent); > > > mem->use_hierarchy = parent->use_hierarchy; > > > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > > return &mem->css; > > > free_out: > > > __mem_cgroup_free(mem); > > > + root_mem_cgroup = NULL; > > > return ERR_PTR(error); > > > } > > > > > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c > > > index ecc3918..4406a9c 100644 > > > --- a/mm/page_cgroup.c > > > +++ b/mm/page_cgroup.c > > > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) > > > > > > #endif > > > > > > - > > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > > > > > static DEFINE_MUTEX(swap_cgroup_mutex); > > > > > Unnecessary diff here. > > > > Yes, I'll add back the space. > > Thanks for the review > > -- > Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v3) 2009-06-05 5:31 ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh 2009-06-05 5:51 ` KAMEZAWA Hiroyuki @ 2009-06-05 6:05 ` Daisuke Nishimura 2009-06-05 9:47 ` Balbir Singh 2009-06-05 6:43 ` Daisuke Nishimura 2009-06-14 18:37 ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh 3 siblings, 1 reply; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-05 6:05 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU. > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > css_put(&mem->css); > return; > } > + > pc->mem_cgroup = mem; > smp_wmb(); > - pc->flags = pcg_default_flags[ctype]; > + switch (ctype) { > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > + SetPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > + SetPageCgroupUsed(pc); I think we need ClearPageCgroupCache() here. Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics(). A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page. > + break; > + default: > + break; > + } > + > + if (mem == root_mem_cgroup) > + SetPageCgroupRoot(pc); > I think you should set PCG_ROOT before setting PCG_USED. IIUC, PCG_ROOT bit must be visible already when PCG_USED is set. > mem_cgroup_charge_statistics(mem, pc, true); > Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v3) 2009-06-05 6:05 ` Daisuke Nishimura @ 2009-06-05 9:47 ` Balbir Singh 2009-06-08 0:03 ` Daisuke Nishimura 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-06-05 9:47 UTC (permalink / raw) To: Daisuke Nishimura Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro * nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-05 15:05:27]: > Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU. > > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > > css_put(&mem->css); > > return; > > } > > + > > pc->mem_cgroup = mem; > > smp_wmb(); > > - pc->flags = pcg_default_flags[ctype]; > > + switch (ctype) { > > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > > + SetPageCgroupCache(pc); > > + SetPageCgroupUsed(pc); > > + break; > > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > > + SetPageCgroupUsed(pc); > I think we need ClearPageCgroupCache() here. > Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics(). > A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page. Yes, I know, I think it is best to set pc->flags to 0 before setting the bits. Thanks! > > > + break; > > + default: > > + break; > > + } > > + > > + if (mem == root_mem_cgroup) > > + SetPageCgroupRoot(pc); > > > I think you should set PCG_ROOT before setting PCG_USED. > IIUC, PCG_ROOT bit must be visible already when PCG_USED is set. Kame pointed to something similar, I am going to remove PCG_ROOT in the next version. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v3) 2009-06-05 9:47 ` Balbir Singh @ 2009-06-08 0:03 ` Daisuke Nishimura 0 siblings, 0 replies; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-08 0:03 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura On Fri, 5 Jun 2009 17:47:21 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-05 15:05:27]: > > > Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU. > > > > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > > > css_put(&mem->css); > > > return; > > > } > > > + > > > pc->mem_cgroup = mem; > > > smp_wmb(); > > > - pc->flags = pcg_default_flags[ctype]; > > > + switch (ctype) { > > > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > > > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > > > + SetPageCgroupCache(pc); > > > + SetPageCgroupUsed(pc); > > > + break; > > > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > > > + SetPageCgroupUsed(pc); > > I think we need ClearPageCgroupCache() here. > > Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics(). > > A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page. > > Yes, I know, I think it is best to set pc->flags to 0 before setting > the bits. Thanks! > I don't think clearing pc->flags is a good idea. It can break PCG_ACCT_LRU bit. ClearPageCgroupCache() before SetPageCgroupUsed() in case of CHARGE_TYPE_MAPPED would be enough. Thanks, Daisuke Nishimura. > > > > > + break; > > > + default: > > > + break; > > > + } > > > + > > > + if (mem == root_mem_cgroup) > > > + SetPageCgroupRoot(pc); > > > > > I think you should set PCG_ROOT before setting PCG_USED. > > IIUC, PCG_ROOT bit must be visible already when PCG_USED is set. > > Kame pointed to something similar, I am going to remove PCG_ROOT in > the next version. > > -- > Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v3) 2009-06-05 5:31 ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh 2009-06-05 5:51 ` KAMEZAWA Hiroyuki 2009-06-05 6:05 ` Daisuke Nishimura @ 2009-06-05 6:43 ` Daisuke Nishimura 2009-06-14 18:37 ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh 3 siblings, 0 replies; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-05 6:43 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura On Fri, 5 Jun 2009 13:31:07 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Here is the new version of the patch with the RFC dropped. Andrew, > Kame, could you please take a look. I am just about to fly out to get > back home tomorrow, so there might be some silence, unless I get to > the next WiFi enabled airport. > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > Changelog v3 -> v2 > > 1. Rebase to mmotm 2nd June 2009 > 2. Test with some of the test cases recommended by Daisuke-San > > Changelog v2 -> v1 > 1. Fix and implement review comments. > > Feature: Remove the overhead associated with the root cgroup > > This patch changes the memory cgroup and removes the overhead associated > with accounting all pages in the root cgroup. As a side-effect, we can > no longer set a memory hard limit in the root cgroup. > > A new flag is used to track page_cgroup associated with the root cgroup > pages. A new flag to track whether the page has been accounted or not > has been added as well. Flags are now set atomically for page_cgroup, > pcg_default_flags is now obsolete, but I've not removed it yet. It > provides some readability to help the code. > > Tests Results: > > Obtained by > > 1. Using tmpfs for mounting filesystem > 2. Changing sync to be /bin/true (so that sync is not the bottleneck) > 3. Used -s #cpus*40 -e #cpus*40 > > Reaim > withoutpatch patch > AIM9 9532.48 9807.59 > dbase 19344.60 19285.71 > new_dbase 20101.65 20163.13 > shared 11827.77 11886.65 > compute 17317.38 17420.05 > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > > include/linux/page_cgroup.h | 12 ++++++++++++ > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- > mm/page_cgroup.c | 1 - > 3 files changed, 50 insertions(+), 5 deletions(-) > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > index 7339c7b..41cc16c 100644 > --- a/include/linux/page_cgroup.h > +++ b/include/linux/page_cgroup.h > @@ -26,6 +26,8 @@ enum { > PCG_LOCK, /* page cgroup is locked */ > PCG_CACHE, /* charged as cache */ > PCG_USED, /* this object is in use. */ > + PCG_ROOT, /* page belongs to root cgroup */ > + PCG_ACCT_LRU, /* page has been accounted for */ > }; > > #define TESTPCGFLAG(uname, lname) \ > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ > > /* Cache flag is set only once (at allocation) */ > TESTPCGFLAG(Cache, CACHE) > +SETPCGFLAG(Cache, CACHE) > > TESTPCGFLAG(Used, USED) > CLEARPCGFLAG(Used, USED) > +SETPCGFLAG(Used, USED) > + > +SETPCGFLAG(Root, ROOT) > +CLEARPCGFLAG(Root, ROOT) > +TESTPCGFLAG(Root, ROOT) > + > +SETPCGFLAG(AcctLru, ACCT_LRU) > +CLEARPCGFLAG(AcctLru, ACCT_LRU) > +TESTPCGFLAG(AcctLru, ACCT_LRU) > > static inline int page_cgroup_nid(struct page_cgroup *pc) > { > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a83e039..9561d10 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -43,6 +43,7 @@ > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > #define MEM_CGROUP_RECLAIM_RETRIES 5 > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > @@ -197,6 +198,10 @@ enum charge_type { > #define PCGF_CACHE (1UL << PCG_CACHE) > #define PCGF_USED (1UL << PCG_USED) > #define PCGF_LOCK (1UL << PCG_LOCK) > +/* Not used, but added here for completeness */ > +#define PCGF_ROOT (1UL << PCG_ROOT) > +#define PCGF_ACCT (1UL << PCG_ACCT) > + > static const unsigned long > pcg_default_flags[NR_CHARGE_TYPE] = { > PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > return; > pc = lookup_page_cgroup(page); > /* can happen while we handle swapcache. */ > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > + if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) > return; > /* > * We don't check PCG_USED bit. It's cleared when the "page" is finally > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > mz = page_cgroup_zoneinfo(pc); > mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > + ClearPageCgroupAcctLru(pc); > + if (PageCgroupRoot(pc)) > + return; > list_del_init(&pc->lru); > return; > } > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > * For making pc->mem_cgroup visible, insert smp_rmb() here. > */ > smp_rmb(); > - /* unused page is not rotated. */ > - if (!PageCgroupUsed(pc)) > + /* unused or root page is not rotated. */ > + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) > return; > mz = page_cgroup_zoneinfo(pc); > list_move(&pc->lru, &mz->lists[lru]); > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > MEM_CGROUP_ZSTAT(mz, lru) += 1; > + SetPageCgroupAcctLru(pc); > + if (PageCgroupRoot(pc)) > + return; > list_add(&pc->lru, &mz->lists[lru]); > } > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > css_put(&mem->css); > return; > } > + > pc->mem_cgroup = mem; > smp_wmb(); > - pc->flags = pcg_default_flags[ctype]; > + switch (ctype) { > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > + SetPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > + SetPageCgroupUsed(pc); > + break; > + default: > + break; > + } > + > + if (mem == root_mem_cgroup) > + SetPageCgroupRoot(pc); > > mem_cgroup_charge_statistics(mem, pc, true); > > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > mem_cgroup_charge_statistics(mem, pc, false); > > ClearPageCgroupUsed(pc); > + if (mem == root_mem_cgroup) > + ClearPageCgroupRoot(pc); If we clear PCG_ROOT here, I think we cannot trust PageCgroupRoot() in mem_cgroup_del_lru_list(). And, if we never clear it on free path, we should clear it on commit_charge if mem != root_mem_cgroup. Thanks, Daisuke Nishimura. > /* > * pc->mem_cgroup is not cleared here. It will be accessed when it's > * freed from LRU. This is safe because uncharged page is expected not > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > name = MEMFILE_ATTR(cft->private); > switch (name) { > case RES_LIMIT: > + if (memcg == root_mem_cgroup) { /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } > /* This function does all necessary parse...reuse it */ > ret = res_counter_memparse_write_strategy(buffer, &val); > if (ret) > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > if (cont->parent == NULL) { > enable_swap_cgroup(); > parent = NULL; > + root_mem_cgroup = mem; > } else { > parent = mem_cgroup_from_cont(cont->parent); > mem->use_hierarchy = parent->use_hierarchy; > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > return &mem->css; > free_out: > __mem_cgroup_free(mem); > + root_mem_cgroup = NULL; > return ERR_PTR(error); > } > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c > index ecc3918..4406a9c 100644 > --- a/mm/page_cgroup.c > +++ b/mm/page_cgroup.c > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) > > #endif > > - > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > static DEFINE_MUTEX(swap_cgroup_mutex); > > -- > Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Low overhead patches for the memory cgroup controller (v4) 2009-06-05 5:31 ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh ` (2 preceding siblings ...) 2009-06-05 6:43 ` Daisuke Nishimura @ 2009-06-14 18:37 ` Balbir Singh 2009-06-15 2:04 ` KAMEZAWA Hiroyuki 2009-06-15 2:18 ` Daisuke Nishimura 3 siblings, 2 replies; 30+ messages in thread From: Balbir Singh @ 2009-06-14 18:37 UTC (permalink / raw) To: KAMEZAWA Hiroyuki, Andrew Morton Cc: linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro Here is v4 of the patches, please review and comment Feature: Remove the overhead associated with the root cgroup From: Balbir Singh <balbir@linux.vnet.ibm.com> changelog v4 -> v3 1. Rebase to mmotm 9th june 2009 2. Remove PageCgroupRoot, we have account LRU flags to indicate that we do only accounting and no reclaim. 3. pcg_default_flags has been used again, since PCGF_ROOT is gone, we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list 4. More LRU functions are aware of PageCgroupAcctLRU Changelog v3 -> v2 1. Rebase to mmotm 2nd June 2009 2. Test with some of the test cases recommended by Daisuke-San Changelog v2 -> v1 1. Rebase to latest mmotm This patch changes the memory cgroup and removes the overhead associated with accounting all pages in the root cgroup. As a side-effect, we can no longer set a memory hard limit in the root cgroup. A new flag to track whether the page has been accounted or not has been added as well. Flags are now set atomically for page_cgroup, Tests: Results (for v2) Obtained by 1. Using tmpfs for mounting filesystem 2. Changing sync to be /bin/true (so that sync is not the bottleneck) 3. Used -s #cpus*40 -e #cpus*40 Reaim withoutpatch patch AIM9 9532.48 9807.59 dbase 19344.60 19285.71 new_dbase 20101.65 20163.13 shared 11827.77 11886.65 compute 17317.38 17420.05 Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- include/linux/page_cgroup.h | 5 ++++ mm/memcontrol.c | 59 ++++++++++++++++++++++++++++++++++++------- 2 files changed, 54 insertions(+), 10 deletions(-) diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 7339c7b..57c4d50 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -26,6 +26,7 @@ enum { PCG_LOCK, /* page cgroup is locked */ PCG_CACHE, /* charged as cache */ PCG_USED, /* this object is in use. */ + PCG_ACCT_LRU, /* page has been accounted for */ }; #define TESTPCGFLAG(uname, lname) \ @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE) TESTPCGFLAG(Used, USED) CLEARPCGFLAG(Used, USED) +SETPCGFLAG(AcctLRU, ACCT_LRU) +CLEARPCGFLAG(AcctLRU, ACCT_LRU) +TESTPCGFLAG(AcctLRU, ACCT_LRU) + static inline int page_cgroup_nid(struct page_cgroup *pc) { return page_to_nid(pc->page); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6ceb6f2..399d416 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -43,6 +43,7 @@ struct cgroup_subsys mem_cgroup_subsys __read_mostly; #define MEM_CGROUP_RECLAIM_RETRIES 5 +struct mem_cgroup *root_mem_cgroup __read_mostly; #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem); static void mem_cgroup_put(struct mem_cgroup *mem); static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem); +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) +{ + return (mem == root_mem_cgroup); +} + static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, struct page_cgroup *pc, bool charge) @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) return; pc = lookup_page_cgroup(page); /* can happen while we handle swapcache. */ - if (list_empty(&pc->lru) || !pc->mem_cgroup) + mem = pc->mem_cgroup; + if (!mem) + return; + if (mem_cgroup_is_root(mem)) { + if (!PageCgroupAcctLRU(pc)) + return; + } else if (list_empty(&pc->lru)) return; + /* * We don't check PCG_USED bit. It's cleared when the "page" is finally * removed from global LRU. */ mz = page_cgroup_zoneinfo(pc); - mem = pc->mem_cgroup; MEM_CGROUP_ZSTAT(mz, lru) -= 1; + if (PageCgroupAcctLRU(pc)) { + ClearPageCgroupAcctLRU(pc); + return; + } list_del_init(&pc->lru); return; } @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) * For making pc->mem_cgroup visible, insert smp_rmb() here. */ smp_rmb(); - /* unused page is not rotated. */ - if (!PageCgroupUsed(pc)) + /* unused or root page is not rotated. */ + if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc)) return; mz = page_cgroup_zoneinfo(pc); list_move(&pc->lru, &mz->lists[lru]); @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) mz = page_cgroup_zoneinfo(pc); MEM_CGROUP_ZSTAT(mz, lru) += 1; + if (mem_cgroup_is_root(pc->mem_cgroup)) { + SetPageCgroupAcctLRU(pc); + return; + } list_add(&pc->lru, &mz->lists[lru]); } @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) * it again. This function is only used to charge SwapCache. It's done under * lock_page and expected that zone->lru_lock is never held. */ -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page, + struct page_cgroup *pc) { unsigned long flags; struct zone *zone = page_zone(page); - struct page_cgroup *pc = lookup_page_cgroup(page); + if (!pc->mem_cgroup || + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) + return; spin_lock_irqsave(&zone->lru_lock, flags); /* * Forget old LRU when this page_cgroup is *not* used. This Used bit @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) spin_unlock_irqrestore(&zone->lru_lock, flags); } -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page, + struct page_cgroup *pc) { unsigned long flags; struct zone *zone = page_zone(page); - struct page_cgroup *pc = lookup_page_cgroup(page); + if (!pc->mem_cgroup || + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) + return; spin_lock_irqsave(&zone->lru_lock, flags); /* link when the page is linked to LRU but page_cgroup isn't */ if (PageLRU(page) && list_empty(&pc->lru)) @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) void mem_cgroup_move_lists(struct page *page, enum lru_list from, enum lru_list to) { + struct page_cgroup *pc = lookup_page_cgroup(page); if (mem_cgroup_disabled()) return; + smp_rmb(); + if (!pc->mem_cgroup || + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) + return; mem_cgroup_del_lru_list(page, from); mem_cgroup_add_lru_list(page, to); } @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, css_put(&mem->css); return; } + pc->mem_cgroup = mem; smp_wmb(); pc->flags = pcg_default_flags[ctype]; @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr, if (!ptr) return; pc = lookup_page_cgroup(page); - mem_cgroup_lru_del_before_commit_swapcache(page); + smp_rmb(); + mem_cgroup_lru_del_before_commit_swapcache(page, pc); __mem_cgroup_commit_charge(ptr, pc, ctype); - mem_cgroup_lru_add_after_commit_swapcache(page); + mem_cgroup_lru_add_after_commit_swapcache(page, pc); /* * Now swap is on-memory. This means this page may be * counted both as mem and swap....double count. @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, name = MEMFILE_ATTR(cft->private); switch (name) { case RES_LIMIT: + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ + ret = -EINVAL; + break; + } /* This function does all necessary parse...reuse it */ ret = res_counter_memparse_write_strategy(buffer, &val); if (ret) @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) if (cont->parent == NULL) { enable_swap_cgroup(); parent = NULL; + root_mem_cgroup = mem; } else { parent = mem_cgroup_from_cont(cont->parent); mem->use_hierarchy = parent->use_hierarchy; @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) return &mem->css; free_out: __mem_cgroup_free(mem); + root_mem_cgroup = NULL; return ERR_PTR(error); } -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-14 18:37 ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh @ 2009-06-15 2:04 ` KAMEZAWA Hiroyuki 2009-06-15 2:18 ` Daisuke Nishimura 1 sibling, 0 replies; 30+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-06-15 2:04 UTC (permalink / raw) To: balbir Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro On Mon, 15 Jun 2009 00:07:40 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Here is v4 of the patches, please review and comment > > Feature: Remove the overhead associated with the root cgroup > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > changelog v4 -> v3 > 1. Rebase to mmotm 9th june 2009 > 2. Remove PageCgroupRoot, we have account LRU flags to indicate that > we do only accounting and no reclaim. > 3. pcg_default_flags has been used again, since PCGF_ROOT is gone, > we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list > 4. More LRU functions are aware of PageCgroupAcctLRU > > Changelog v3 -> v2 > > 1. Rebase to mmotm 2nd June 2009 > 2. Test with some of the test cases recommended by Daisuke-San > > Changelog v2 -> v1 > 1. Rebase to latest mmotm > > This patch changes the memory cgroup and removes the overhead associated > with accounting all pages in the root cgroup. As a side-effect, we can > no longer set a memory hard limit in the root cgroup. > > A new flag to track whether the page has been accounted or not > has been added as well. Flags are now set atomically for page_cgroup, > > Tests: > > Results (for v2) > > Obtained by > > 1. Using tmpfs for mounting filesystem > 2. Changing sync to be /bin/true (so that sync is not the bottleneck) > 3. Used -s #cpus*40 -e #cpus*40 > > Reaim > withoutpatch patch > AIM9 9532.48 9807.59 > dbase 19344.60 19285.71 > new_dbase 20101.65 20163.13 > shared 11827.77 11886.65 > compute 17317.38 17420.05 > Hmm, how much overhead this patch adds for non-root cgroup ? It seems getting better in general. But I have a few suggestions. > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > > include/linux/page_cgroup.h | 5 ++++ > mm/memcontrol.c | 59 ++++++++++++++++++++++++++++++++++++------- > 2 files changed, 54 insertions(+), 10 deletions(-) > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > index 7339c7b..57c4d50 100644 > --- a/include/linux/page_cgroup.h > +++ b/include/linux/page_cgroup.h > @@ -26,6 +26,7 @@ enum { > PCG_LOCK, /* page cgroup is locked */ > PCG_CACHE, /* charged as cache */ > PCG_USED, /* this object is in use. */ > + PCG_ACCT_LRU, /* page has been accounted for */ > }; > > #define TESTPCGFLAG(uname, lname) \ > @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE) > TESTPCGFLAG(Used, USED) > CLEARPCGFLAG(Used, USED) > > +SETPCGFLAG(AcctLRU, ACCT_LRU) > +CLEARPCGFLAG(AcctLRU, ACCT_LRU) > +TESTPCGFLAG(AcctLRU, ACCT_LRU) > + > static inline int page_cgroup_nid(struct page_cgroup *pc) > { > return page_to_nid(pc->page); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6ceb6f2..399d416 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -43,6 +43,7 @@ > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > #define MEM_CGROUP_RECLAIM_RETRIES 5 > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem); > static void mem_cgroup_put(struct mem_cgroup *mem); > static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem); > > +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) > +{ > + return (mem == root_mem_cgroup); > +} > + > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, > struct page_cgroup *pc, > bool charge) > @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > return; > pc = lookup_page_cgroup(page); > /* can happen while we handle swapcache. */ > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > + mem = pc->mem_cgroup; > + if (!mem) > + return; > + if (mem_cgroup_is_root(mem)) { > + if (!PageCgroupAcctLRU(pc)) > + return; > + } else if (list_empty(&pc->lru)) > return; > + > /* > * We don't check PCG_USED bit. It's cleared when the "page" is finally > * removed from global LRU. > */ > mz = page_cgroup_zoneinfo(pc); > - mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > + if (PageCgroupAcctLRU(pc)) { > + ClearPageCgroupAcctLRU(pc); > + return; > + } > list_del_init(&pc->lru); > return; > } Looking through the whole code, PageCgroupAcctLRU() is meaningful only when pc->mem_cgroup == root_mem_cgroup. Right ? I wonder making PageCgroupAcctLRU() be always meaningful and remove all !list_empty(&pc->lru) check is a way to go. If do so, this function can be written as == if (!PageCgroupAcctLRU(pc)) return; mem = pc->mem_cgroup; mz = page_cgroup_zoneinfo(pc); MEM_CGROUP_ZSTAT(mz, lru) -= 1; ClearPageCgroupAcctLRU(pc); /* We don't maintain LRU for root cgroup. Global LRU works for us. */ if (!mem_cgroup_is_root(mem)) list_del_init(&pc->lru); == This seems much straightforward. > @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > * For making pc->mem_cgroup visible, insert smp_rmb() here. > */ > smp_rmb(); > - /* unused page is not rotated. */ > - if (!PageCgroupUsed(pc)) > + /* unused or root page is not rotated. */ > + if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc)) > return; > mz = page_cgroup_zoneinfo(pc); > list_move(&pc->lru, &mz->lists[lru]); > @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > MEM_CGROUP_ZSTAT(mz, lru) += 1; > + if (mem_cgroup_is_root(pc->mem_cgroup)) { > + SetPageCgroupAcctLRU(pc); > + return; > + } > list_add(&pc->lru, &mz->lists[lru]); > } With above (my) rule. Here will be SetPageCgroupAcctLRU(pc); if (!mem_cgroup_is_root(pc->mem_cgroup)) list_add(&pc->lru, &mz->lists[lru]); > @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > * it again. This function is only used to charge SwapCache. It's done under > * lock_page and expected that zone->lru_lock is never held. > */ > -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) > +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page, > + struct page_cgroup *pc) > { > unsigned long flags; > struct zone *zone = page_zone(page); > - struct page_cgroup *pc = lookup_page_cgroup(page); > > + if (!pc->mem_cgroup || > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > + return; PageCgroupAcctLRU() check is done without zone->lock and this is racy if you check flag. Considering how "pagevec" works, this race tend to be big. > spin_lock_irqsave(&zone->lru_lock, flags); > /* > * Forget old LRU when this page_cgroup is *not* used. This Used bit > @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) > spin_unlock_irqrestore(&zone->lru_lock, flags); > } > > -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) > +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page, > + struct page_cgroup *pc) > { > unsigned long flags; > struct zone *zone = page_zone(page); > - struct page_cgroup *pc = lookup_page_cgroup(page); > > + if (!pc->mem_cgroup || > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > + return; The same comment as above. > spin_lock_irqsave(&zone->lru_lock, flags); > /* link when the page is linked to LRU but page_cgroup isn't */ > if (PageLRU(page) && list_empty(&pc->lru)) > @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) > void mem_cgroup_move_lists(struct page *page, > enum lru_list from, enum lru_list to) > { > + struct page_cgroup *pc = lookup_page_cgroup(page); > if (mem_cgroup_disabled()) > return; > + smp_rmb(); > + if (!pc->mem_cgroup || > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > + return; > mem_cgroup_del_lru_list(page, from); > mem_cgroup_add_lru_list(page, to); > } Here, too. > @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > css_put(&mem->css); > return; > } > + > pc->mem_cgroup = mem; > smp_wmb(); > pc->flags = pcg_default_flags[ctype]; > @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr, > if (!ptr) > return; > pc = lookup_page_cgroup(page); > - mem_cgroup_lru_del_before_commit_swapcache(page); > + smp_rmb(); > + mem_cgroup_lru_del_before_commit_swapcache(page, pc); > __mem_cgroup_commit_charge(ptr, pc, ctype); > - mem_cgroup_lru_add_after_commit_swapcache(page); > + mem_cgroup_lru_add_after_commit_swapcache(page, pc); Why this change ? When you adds memory barrier, plz add comments. > /* > * Now swap is on-memory. This means this page may be > * counted both as mem and swap....double count. > @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > name = MEMFILE_ATTR(cft->private); > switch (name) { > case RES_LIMIT: > + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } Could you add modification to Documentation in the next post ? > /* This function does all necessary parse...reuse it */ > ret = res_counter_memparse_write_strategy(buffer, &val); > if (ret) > @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > if (cont->parent == NULL) { > enable_swap_cgroup(); > parent = NULL; > + root_mem_cgroup = mem; > } else { > parent = mem_cgroup_from_cont(cont->parent); > mem->use_hierarchy = parent->use_hierarchy; > @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > return &mem->css; > free_out: > __mem_cgroup_free(mem); > + root_mem_cgroup = NULL; > return ERR_PTR(error); > } > Could you start next thread in the next post ? Once I read and make this from unread to read, this goes far deep of old mail tree ;) Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-14 18:37 ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh 2009-06-15 2:04 ` KAMEZAWA Hiroyuki @ 2009-06-15 2:18 ` Daisuke Nishimura 2009-06-15 2:23 ` KAMEZAWA Hiroyuki 2009-06-15 3:00 ` Balbir Singh 1 sibling, 2 replies; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-15 2:18 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Here is v4 of the patches, please review and comment > > Feature: Remove the overhead associated with the root cgroup > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > changelog v4 -> v3 > 1. Rebase to mmotm 9th june 2009 > 2. Remove PageCgroupRoot, we have account LRU flags to indicate that > we do only accounting and no reclaim. hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks. > 3. pcg_default_flags has been used again, since PCGF_ROOT is gone, > we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU outside of zone->lru_lock. IMHO, the most complicated case is a SwapCache which has been read ahead by a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be on page_vec and be drained to LRU asymmetrically with do_swap_page(). Well, yes it would be safe just because PCGF_ACCT_LRU would not be set if PCGF_USED has not been set, but I don't think it's a good idea to touch PCGF_ACCT_LRU outside of zone->lru_lock anyway. Doesn't a patch like below work for you ? Lightly tested under global memory pressure(w/o memcg's memory pressure) on a small machine(just a bit modified from then though). === Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> --- include/linux/page_cgroup.h | 13 ++++++++++ mm/memcontrol.c | 54 +++++++++++++++++++++++++++++++----------- 2 files changed, 53 insertions(+), 14 deletions(-) diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 7339c7b..debd8ba 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -26,6 +26,7 @@ enum { PCG_LOCK, /* page cgroup is locked */ PCG_CACHE, /* charged as cache */ PCG_USED, /* this object is in use. */ + PCG_ACCT_LRU, /* page has been accounted for */ }; #define TESTPCGFLAG(uname, lname) \ @@ -40,11 +41,23 @@ static inline void SetPageCgroup##uname(struct page_cgroup *pc)\ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ { clear_bit(PCG_##lname, &pc->flags); } +#define TESTCLEARPCGFLAG(uname, lname) \ +static inline int TestClearPageCgroup##uname(struct page_cgroup *pc) \ + { return test_and_clear_bit(PCG_##lname, &pc->flags); } + /* Cache flag is set only once (at allocation) */ TESTPCGFLAG(Cache, CACHE) +CLEARPCGFLAG(Cache, CACHE) +SETPCGFLAG(Cache, CACHE) TESTPCGFLAG(Used, USED) CLEARPCGFLAG(Used, USED) +SETPCGFLAG(Used, USED) + +SETPCGFLAG(AcctLRU, ACCT_LRU) +CLEARPCGFLAG(AcctLRU, ACCT_LRU) +TESTPCGFLAG(AcctLRU, ACCT_LRU) +TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU) static inline int page_cgroup_nid(struct page_cgroup *pc) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index dbece65..820f3e6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -43,6 +43,7 @@ struct cgroup_subsys mem_cgroup_subsys __read_mostly; #define MEM_CGROUP_RECLAIM_RETRIES 5 +struct mem_cgroup *root_mem_cgroup __read_mostly; #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ @@ -200,13 +201,8 @@ enum charge_type { #define PCGF_CACHE (1UL << PCG_CACHE) #define PCGF_USED (1UL << PCG_USED) #define PCGF_LOCK (1UL << PCG_LOCK) -static const unsigned long -pcg_default_flags[NR_CHARGE_TYPE] = { - PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ - PCGF_USED | PCGF_LOCK, /* Anon */ - PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* Shmem */ - 0, /* FORCE */ -}; +/* Not used, but added here for completeness */ +#define PCGF_ACCT (1UL << PCG_ACCT) /* for encoding cft->private value on file */ #define _MEM (0) @@ -354,6 +350,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data, return ret; } +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) +{ + return (mem == root_mem_cgroup); +} + /* * Following LRU functions are allowed to be used without PCG_LOCK. * Operations are called by routine of global LRU independently from memcg. @@ -371,22 +372,24 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data, void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) { struct page_cgroup *pc; - struct mem_cgroup *mem; struct mem_cgroup_per_zone *mz; if (mem_cgroup_disabled()) return; pc = lookup_page_cgroup(page); /* can happen while we handle swapcache. */ - if (list_empty(&pc->lru) || !pc->mem_cgroup) + if (!TestClearPageCgroupAcctLRU(pc)) return; + VM_BUG_ON(!pc->mem_cgroup); /* * We don't check PCG_USED bit. It's cleared when the "page" is finally * removed from global LRU. */ mz = page_cgroup_zoneinfo(pc); - mem = pc->mem_cgroup; MEM_CGROUP_ZSTAT(mz, lru) -= 1; + if (mem_cgroup_is_root(pc->mem_cgroup)) + return; + VM_BUG_ON(list_empty(&pc->lru)); list_del_init(&pc->lru); return; } @@ -410,8 +413,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) * For making pc->mem_cgroup visible, insert smp_rmb() here. */ smp_rmb(); - /* unused page is not rotated. */ - if (!PageCgroupUsed(pc)) + /* unused or root page is not rotated. */ + if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc)) return; mz = page_cgroup_zoneinfo(pc); list_move(&pc->lru, &mz->lists[lru]); @@ -425,6 +428,7 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) if (mem_cgroup_disabled()) return; pc = lookup_page_cgroup(page); + VM_BUG_ON(PageCgroupAcctLRU(pc)); /* * Used bit is set without atomic ops but after smp_wmb(). * For making pc->mem_cgroup visible, insert smp_rmb() here. @@ -435,6 +439,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) mz = page_cgroup_zoneinfo(pc); MEM_CGROUP_ZSTAT(mz, lru) += 1; + SetPageCgroupAcctLRU(pc); + if (mem_cgroup_is_root(pc->mem_cgroup)) + return; list_add(&pc->lru, &mz->lists[lru]); } @@ -469,7 +476,7 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) spin_lock_irqsave(&zone->lru_lock, flags); /* link when the page is linked to LRU but page_cgroup isn't */ - if (PageLRU(page) && list_empty(&pc->lru)) + if (PageLRU(page) && !PageCgroupAcctLRU(pc)) mem_cgroup_add_lru_list(page, page_lru(page)); spin_unlock_irqrestore(&zone->lru_lock, flags); } @@ -1106,9 +1113,22 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, css_put(&mem->css); return; } + pc->mem_cgroup = mem; smp_wmb(); - pc->flags = pcg_default_flags[ctype]; + switch (ctype) { + case MEM_CGROUP_CHARGE_TYPE_CACHE: + case MEM_CGROUP_CHARGE_TYPE_SHMEM: + SetPageCgroupCache(pc); + SetPageCgroupUsed(pc); + break; + case MEM_CGROUP_CHARGE_TYPE_MAPPED: + ClearPageCgroupCache(pc); + SetPageCgroupUsed(pc); + break; + default: + break; + } mem_cgroup_charge_statistics(mem, pc, true); @@ -2047,6 +2067,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, name = MEMFILE_ATTR(cft->private); switch (name) { case RES_LIMIT: + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ + ret = -EINVAL; + break; + } /* This function does all necessary parse...reuse it */ ret = res_counter_memparse_write_strategy(buffer, &val); if (ret) @@ -2513,6 +2537,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) if (cont->parent == NULL) { enable_swap_cgroup(); parent = NULL; + root_mem_cgroup = mem; } else { parent = mem_cgroup_from_cont(cont->parent); mem->use_hierarchy = parent->use_hierarchy; @@ -2541,6 +2566,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) return &mem->css; free_out: __mem_cgroup_free(mem); + root_mem_cgroup = NULL; return ERR_PTR(error); } === Thanks, Daisuke Nishimura. > 4. More LRU functions are aware of PageCgroupAcctLRU > > Changelog v3 -> v2 > > 1. Rebase to mmotm 2nd June 2009 > 2. Test with some of the test cases recommended by Daisuke-San > > Changelog v2 -> v1 > 1. Rebase to latest mmotm > > This patch changes the memory cgroup and removes the overhead associated > with accounting all pages in the root cgroup. As a side-effect, we can > no longer set a memory hard limit in the root cgroup. > > A new flag to track whether the page has been accounted or not > has been added as well. Flags are now set atomically for page_cgroup, > > Tests: > > Results (for v2) > > Obtained by > > 1. Using tmpfs for mounting filesystem > 2. Changing sync to be /bin/true (so that sync is not the bottleneck) > 3. Used -s #cpus*40 -e #cpus*40 > > Reaim > withoutpatch patch > AIM9 9532.48 9807.59 > dbase 19344.60 19285.71 > new_dbase 20101.65 20163.13 > shared 11827.77 11886.65 > compute 17317.38 17420.05 > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > > include/linux/page_cgroup.h | 5 ++++ > mm/memcontrol.c | 59 ++++++++++++++++++++++++++++++++++++------- > 2 files changed, 54 insertions(+), 10 deletions(-) > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > index 7339c7b..57c4d50 100644 > --- a/include/linux/page_cgroup.h > +++ b/include/linux/page_cgroup.h > @@ -26,6 +26,7 @@ enum { > PCG_LOCK, /* page cgroup is locked */ > PCG_CACHE, /* charged as cache */ > PCG_USED, /* this object is in use. */ > + PCG_ACCT_LRU, /* page has been accounted for */ > }; > > #define TESTPCGFLAG(uname, lname) \ > @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE) > TESTPCGFLAG(Used, USED) > CLEARPCGFLAG(Used, USED) > > +SETPCGFLAG(AcctLRU, ACCT_LRU) > +CLEARPCGFLAG(AcctLRU, ACCT_LRU) > +TESTPCGFLAG(AcctLRU, ACCT_LRU) > + > static inline int page_cgroup_nid(struct page_cgroup *pc) > { > return page_to_nid(pc->page); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6ceb6f2..399d416 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -43,6 +43,7 @@ > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > #define MEM_CGROUP_RECLAIM_RETRIES 5 > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem); > static void mem_cgroup_put(struct mem_cgroup *mem); > static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem); > > +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) > +{ > + return (mem == root_mem_cgroup); > +} > + > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, > struct page_cgroup *pc, > bool charge) > @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > return; > pc = lookup_page_cgroup(page); > /* can happen while we handle swapcache. */ > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > + mem = pc->mem_cgroup; > + if (!mem) > + return; > + if (mem_cgroup_is_root(mem)) { > + if (!PageCgroupAcctLRU(pc)) > + return; > + } else if (list_empty(&pc->lru)) > return; > + > /* > * We don't check PCG_USED bit. It's cleared when the "page" is finally > * removed from global LRU. > */ > mz = page_cgroup_zoneinfo(pc); > - mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > + if (PageCgroupAcctLRU(pc)) { > + ClearPageCgroupAcctLRU(pc); > + return; > + } > list_del_init(&pc->lru); > return; > } > @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > * For making pc->mem_cgroup visible, insert smp_rmb() here. > */ > smp_rmb(); > - /* unused page is not rotated. */ > - if (!PageCgroupUsed(pc)) > + /* unused or root page is not rotated. */ > + if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc)) > return; > mz = page_cgroup_zoneinfo(pc); > list_move(&pc->lru, &mz->lists[lru]); > @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > MEM_CGROUP_ZSTAT(mz, lru) += 1; > + if (mem_cgroup_is_root(pc->mem_cgroup)) { > + SetPageCgroupAcctLRU(pc); > + return; > + } > list_add(&pc->lru, &mz->lists[lru]); > } > > @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > * it again. This function is only used to charge SwapCache. It's done under > * lock_page and expected that zone->lru_lock is never held. > */ > -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) > +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page, > + struct page_cgroup *pc) > { > unsigned long flags; > struct zone *zone = page_zone(page); > - struct page_cgroup *pc = lookup_page_cgroup(page); > > + if (!pc->mem_cgroup || > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > + return; > spin_lock_irqsave(&zone->lru_lock, flags); > /* > * Forget old LRU when this page_cgroup is *not* used. This Used bit > @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) > spin_unlock_irqrestore(&zone->lru_lock, flags); > } > > -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) > +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page, > + struct page_cgroup *pc) > { > unsigned long flags; > struct zone *zone = page_zone(page); > - struct page_cgroup *pc = lookup_page_cgroup(page); > > + if (!pc->mem_cgroup || > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > + return; > spin_lock_irqsave(&zone->lru_lock, flags); > /* link when the page is linked to LRU but page_cgroup isn't */ > if (PageLRU(page) && list_empty(&pc->lru)) > @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) > void mem_cgroup_move_lists(struct page *page, > enum lru_list from, enum lru_list to) > { > + struct page_cgroup *pc = lookup_page_cgroup(page); > if (mem_cgroup_disabled()) > return; > + smp_rmb(); > + if (!pc->mem_cgroup || > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > + return; > mem_cgroup_del_lru_list(page, from); > mem_cgroup_add_lru_list(page, to); > } > @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > css_put(&mem->css); > return; > } > + > pc->mem_cgroup = mem; > smp_wmb(); > pc->flags = pcg_default_flags[ctype]; > @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr, > if (!ptr) > return; > pc = lookup_page_cgroup(page); > - mem_cgroup_lru_del_before_commit_swapcache(page); > + smp_rmb(); > + mem_cgroup_lru_del_before_commit_swapcache(page, pc); > __mem_cgroup_commit_charge(ptr, pc, ctype); > - mem_cgroup_lru_add_after_commit_swapcache(page); > + mem_cgroup_lru_add_after_commit_swapcache(page, pc); > /* > * Now swap is on-memory. This means this page may be > * counted both as mem and swap....double count. > @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > name = MEMFILE_ATTR(cft->private); > switch (name) { > case RES_LIMIT: > + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } > /* This function does all necessary parse...reuse it */ > ret = res_counter_memparse_write_strategy(buffer, &val); > if (ret) > @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > if (cont->parent == NULL) { > enable_swap_cgroup(); > parent = NULL; > + root_mem_cgroup = mem; > } else { > parent = mem_cgroup_from_cont(cont->parent); > mem->use_hierarchy = parent->use_hierarchy; > @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > return &mem->css; > free_out: > __mem_cgroup_free(mem); > + root_mem_cgroup = NULL; > return ERR_PTR(error); > } > > > -- > Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-15 2:18 ` Daisuke Nishimura @ 2009-06-15 2:23 ` KAMEZAWA Hiroyuki 2009-06-15 2:44 ` Balbir Singh 2009-06-15 3:00 ` Balbir Singh 1 sibling, 1 reply; 30+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-06-15 2:23 UTC (permalink / raw) To: Daisuke Nishimura Cc: balbir, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro On Mon, 15 Jun 2009 11:18:17 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > Here is v4 of the patches, please review and comment > > > > Feature: Remove the overhead associated with the root cgroup > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > changelog v4 -> v3 > > 1. Rebase to mmotm 9th june 2009 > > 2. Remove PageCgroupRoot, we have account LRU flags to indicate that > > we do only accounting and no reclaim. > hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be > used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks. > > > 3. pcg_default_flags has been used again, since PCGF_ROOT is gone, > > we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list > It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU > outside of zone->lru_lock. > > IMHO, the most complicated case is a SwapCache which has been read ahead by > a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be > on page_vec and be drained to LRU asymmetrically with do_swap_page(). > Well, yes it would be safe just because PCGF_ACCT_LRU would not be set > if PCGF_USED has not been set, but I don't think it's a good idea to touch > PCGF_ACCT_LRU outside of zone->lru_lock anyway. > > > Doesn't a patch like below work for you ? > Lightly tested under global memory pressure(w/o memcg's memory pressure) > on a small machine(just a bit modified from then though). > This patch includes almost all what I want ;) Thanks, -Kame > === > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > --- > include/linux/page_cgroup.h | 13 ++++++++++ > mm/memcontrol.c | 54 +++++++++++++++++++++++++++++++----------- > 2 files changed, 53 insertions(+), 14 deletions(-) > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > index 7339c7b..debd8ba 100644 > --- a/include/linux/page_cgroup.h > +++ b/include/linux/page_cgroup.h > @@ -26,6 +26,7 @@ enum { > PCG_LOCK, /* page cgroup is locked */ > PCG_CACHE, /* charged as cache */ > PCG_USED, /* this object is in use. */ > + PCG_ACCT_LRU, /* page has been accounted for */ > }; > > #define TESTPCGFLAG(uname, lname) \ > @@ -40,11 +41,23 @@ static inline void SetPageCgroup##uname(struct page_cgroup *pc)\ > static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ > { clear_bit(PCG_##lname, &pc->flags); } > > +#define TESTCLEARPCGFLAG(uname, lname) \ > +static inline int TestClearPageCgroup##uname(struct page_cgroup *pc) \ > + { return test_and_clear_bit(PCG_##lname, &pc->flags); } > + > /* Cache flag is set only once (at allocation) */ > TESTPCGFLAG(Cache, CACHE) > +CLEARPCGFLAG(Cache, CACHE) > +SETPCGFLAG(Cache, CACHE) > > TESTPCGFLAG(Used, USED) > CLEARPCGFLAG(Used, USED) > +SETPCGFLAG(Used, USED) > + > +SETPCGFLAG(AcctLRU, ACCT_LRU) > +CLEARPCGFLAG(AcctLRU, ACCT_LRU) > +TESTPCGFLAG(AcctLRU, ACCT_LRU) > +TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU) > > static inline int page_cgroup_nid(struct page_cgroup *pc) > { > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index dbece65..820f3e6 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -43,6 +43,7 @@ > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > #define MEM_CGROUP_RECLAIM_RETRIES 5 > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > @@ -200,13 +201,8 @@ enum charge_type { > #define PCGF_CACHE (1UL << PCG_CACHE) > #define PCGF_USED (1UL << PCG_USED) > #define PCGF_LOCK (1UL << PCG_LOCK) > -static const unsigned long > -pcg_default_flags[NR_CHARGE_TYPE] = { > - PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ > - PCGF_USED | PCGF_LOCK, /* Anon */ > - PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* Shmem */ > - 0, /* FORCE */ > -}; > +/* Not used, but added here for completeness */ > +#define PCGF_ACCT (1UL << PCG_ACCT) > > /* for encoding cft->private value on file */ > #define _MEM (0) > @@ -354,6 +350,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data, > return ret; > } > > +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) > +{ > + return (mem == root_mem_cgroup); > +} > + > /* > * Following LRU functions are allowed to be used without PCG_LOCK. > * Operations are called by routine of global LRU independently from memcg. > @@ -371,22 +372,24 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data, > void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > { > struct page_cgroup *pc; > - struct mem_cgroup *mem; > struct mem_cgroup_per_zone *mz; > > if (mem_cgroup_disabled()) > return; > pc = lookup_page_cgroup(page); > /* can happen while we handle swapcache. */ > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > + if (!TestClearPageCgroupAcctLRU(pc)) > return; > + VM_BUG_ON(!pc->mem_cgroup); > /* > * We don't check PCG_USED bit. It's cleared when the "page" is finally > * removed from global LRU. > */ > mz = page_cgroup_zoneinfo(pc); > - mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > + if (mem_cgroup_is_root(pc->mem_cgroup)) > + return; > + VM_BUG_ON(list_empty(&pc->lru)); > list_del_init(&pc->lru); > return; > } > @@ -410,8 +413,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > * For making pc->mem_cgroup visible, insert smp_rmb() here. > */ > smp_rmb(); > - /* unused page is not rotated. */ > - if (!PageCgroupUsed(pc)) > + /* unused or root page is not rotated. */ > + if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc)) > return; > mz = page_cgroup_zoneinfo(pc); > list_move(&pc->lru, &mz->lists[lru]); > @@ -425,6 +428,7 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > if (mem_cgroup_disabled()) > return; > pc = lookup_page_cgroup(page); > + VM_BUG_ON(PageCgroupAcctLRU(pc)); > /* > * Used bit is set without atomic ops but after smp_wmb(). > * For making pc->mem_cgroup visible, insert smp_rmb() here. > @@ -435,6 +439,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > MEM_CGROUP_ZSTAT(mz, lru) += 1; > + SetPageCgroupAcctLRU(pc); > + if (mem_cgroup_is_root(pc->mem_cgroup)) > + return; > list_add(&pc->lru, &mz->lists[lru]); > } > > @@ -469,7 +476,7 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) > > spin_lock_irqsave(&zone->lru_lock, flags); > /* link when the page is linked to LRU but page_cgroup isn't */ > - if (PageLRU(page) && list_empty(&pc->lru)) > + if (PageLRU(page) && !PageCgroupAcctLRU(pc)) > mem_cgroup_add_lru_list(page, page_lru(page)); > spin_unlock_irqrestore(&zone->lru_lock, flags); > } > @@ -1106,9 +1113,22 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > css_put(&mem->css); > return; > } > + > pc->mem_cgroup = mem; > smp_wmb(); > - pc->flags = pcg_default_flags[ctype]; > + switch (ctype) { > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > + SetPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > + ClearPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + default: > + break; > + } > > mem_cgroup_charge_statistics(mem, pc, true); > > @@ -2047,6 +2067,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > name = MEMFILE_ATTR(cft->private); > switch (name) { > case RES_LIMIT: > + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } > /* This function does all necessary parse...reuse it */ > ret = res_counter_memparse_write_strategy(buffer, &val); > if (ret) > @@ -2513,6 +2537,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > if (cont->parent == NULL) { > enable_swap_cgroup(); > parent = NULL; > + root_mem_cgroup = mem; > } else { > parent = mem_cgroup_from_cont(cont->parent); > mem->use_hierarchy = parent->use_hierarchy; > @@ -2541,6 +2566,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > return &mem->css; > free_out: > __mem_cgroup_free(mem); > + root_mem_cgroup = NULL; > return ERR_PTR(error); > } > > > === > > > Thanks, > Daisuke Nishimura. > > > 4. More LRU functions are aware of PageCgroupAcctLRU > > > > Changelog v3 -> v2 > > > > 1. Rebase to mmotm 2nd June 2009 > > 2. Test with some of the test cases recommended by Daisuke-San > > > > Changelog v2 -> v1 > > 1. Rebase to latest mmotm > > > > This patch changes the memory cgroup and removes the overhead associated > > with accounting all pages in the root cgroup. As a side-effect, we can > > no longer set a memory hard limit in the root cgroup. > > > > A new flag to track whether the page has been accounted or not > > has been added as well. Flags are now set atomically for page_cgroup, > > > > Tests: > > > > Results (for v2) > > > > Obtained by > > > > 1. Using tmpfs for mounting filesystem > > 2. Changing sync to be /bin/true (so that sync is not the bottleneck) > > 3. Used -s #cpus*40 -e #cpus*40 > > > > Reaim > > withoutpatch patch > > AIM9 9532.48 9807.59 > > dbase 19344.60 19285.71 > > new_dbase 20101.65 20163.13 > > shared 11827.77 11886.65 > > compute 17317.38 17420.05 > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > --- > > > > include/linux/page_cgroup.h | 5 ++++ > > mm/memcontrol.c | 59 ++++++++++++++++++++++++++++++++++++------- > > 2 files changed, 54 insertions(+), 10 deletions(-) > > > > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > > index 7339c7b..57c4d50 100644 > > --- a/include/linux/page_cgroup.h > > +++ b/include/linux/page_cgroup.h > > @@ -26,6 +26,7 @@ enum { > > PCG_LOCK, /* page cgroup is locked */ > > PCG_CACHE, /* charged as cache */ > > PCG_USED, /* this object is in use. */ > > + PCG_ACCT_LRU, /* page has been accounted for */ > > }; > > > > #define TESTPCGFLAG(uname, lname) \ > > @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE) > > TESTPCGFLAG(Used, USED) > > CLEARPCGFLAG(Used, USED) > > > > +SETPCGFLAG(AcctLRU, ACCT_LRU) > > +CLEARPCGFLAG(AcctLRU, ACCT_LRU) > > +TESTPCGFLAG(AcctLRU, ACCT_LRU) > > + > > static inline int page_cgroup_nid(struct page_cgroup *pc) > > { > > return page_to_nid(pc->page); > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 6ceb6f2..399d416 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -43,6 +43,7 @@ > > > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > > #define MEM_CGROUP_RECLAIM_RETRIES 5 > > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */ > > @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem); > > static void mem_cgroup_put(struct mem_cgroup *mem); > > static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem); > > > > +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) > > +{ > > + return (mem == root_mem_cgroup); > > +} > > + > > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, > > struct page_cgroup *pc, > > bool charge) > > @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > > return; > > pc = lookup_page_cgroup(page); > > /* can happen while we handle swapcache. */ > > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > > + mem = pc->mem_cgroup; > > + if (!mem) > > + return; > > + if (mem_cgroup_is_root(mem)) { > > + if (!PageCgroupAcctLRU(pc)) > > + return; > > + } else if (list_empty(&pc->lru)) > > return; > > + > > /* > > * We don't check PCG_USED bit. It's cleared when the "page" is finally > > * removed from global LRU. > > */ > > mz = page_cgroup_zoneinfo(pc); > > - mem = pc->mem_cgroup; > > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > > + if (PageCgroupAcctLRU(pc)) { > > + ClearPageCgroupAcctLRU(pc); > > + return; > > + } > > list_del_init(&pc->lru); > > return; > > } > > @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > > * For making pc->mem_cgroup visible, insert smp_rmb() here. > > */ > > smp_rmb(); > > - /* unused page is not rotated. */ > > - if (!PageCgroupUsed(pc)) > > + /* unused or root page is not rotated. */ > > + if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc)) > > return; > > mz = page_cgroup_zoneinfo(pc); > > list_move(&pc->lru, &mz->lists[lru]); > > @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > > > mz = page_cgroup_zoneinfo(pc); > > MEM_CGROUP_ZSTAT(mz, lru) += 1; > > + if (mem_cgroup_is_root(pc->mem_cgroup)) { > > + SetPageCgroupAcctLRU(pc); > > + return; > > + } > > list_add(&pc->lru, &mz->lists[lru]); > > } > > > > @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > * it again. This function is only used to charge SwapCache. It's done under > > * lock_page and expected that zone->lru_lock is never held. > > */ > > -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) > > +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page, > > + struct page_cgroup *pc) > > { > > unsigned long flags; > > struct zone *zone = page_zone(page); > > - struct page_cgroup *pc = lookup_page_cgroup(page); > > > > + if (!pc->mem_cgroup || > > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > > + return; > > spin_lock_irqsave(&zone->lru_lock, flags); > > /* > > * Forget old LRU when this page_cgroup is *not* used. This Used bit > > @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page) > > spin_unlock_irqrestore(&zone->lru_lock, flags); > > } > > > > -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) > > +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page, > > + struct page_cgroup *pc) > > { > > unsigned long flags; > > struct zone *zone = page_zone(page); > > - struct page_cgroup *pc = lookup_page_cgroup(page); > > > > + if (!pc->mem_cgroup || > > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > > + return; > > spin_lock_irqsave(&zone->lru_lock, flags); > > /* link when the page is linked to LRU but page_cgroup isn't */ > > if (PageLRU(page) && list_empty(&pc->lru)) > > @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page) > > void mem_cgroup_move_lists(struct page *page, > > enum lru_list from, enum lru_list to) > > { > > + struct page_cgroup *pc = lookup_page_cgroup(page); > > if (mem_cgroup_disabled()) > > return; > > + smp_rmb(); > > + if (!pc->mem_cgroup || > > + (!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup))) > > + return; > > mem_cgroup_del_lru_list(page, from); > > mem_cgroup_add_lru_list(page, to); > > } > > @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > > css_put(&mem->css); > > return; > > } > > + > > pc->mem_cgroup = mem; > > smp_wmb(); > > pc->flags = pcg_default_flags[ctype]; > > @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr, > > if (!ptr) > > return; > > pc = lookup_page_cgroup(page); > > - mem_cgroup_lru_del_before_commit_swapcache(page); > > + smp_rmb(); > > + mem_cgroup_lru_del_before_commit_swapcache(page, pc); > > __mem_cgroup_commit_charge(ptr, pc, ctype); > > - mem_cgroup_lru_add_after_commit_swapcache(page); > > + mem_cgroup_lru_add_after_commit_swapcache(page, pc); > > /* > > * Now swap is on-memory. This means this page may be > > * counted both as mem and swap....double count. > > @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > > name = MEMFILE_ATTR(cft->private); > > switch (name) { > > case RES_LIMIT: > > + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ > > + ret = -EINVAL; > > + break; > > + } > > /* This function does all necessary parse...reuse it */ > > ret = res_counter_memparse_write_strategy(buffer, &val); > > if (ret) > > @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > if (cont->parent == NULL) { > > enable_swap_cgroup(); > > parent = NULL; > > + root_mem_cgroup = mem; > > } else { > > parent = mem_cgroup_from_cont(cont->parent); > > mem->use_hierarchy = parent->use_hierarchy; > > @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > return &mem->css; > > free_out: > > __mem_cgroup_free(mem); > > + root_mem_cgroup = NULL; > > return ERR_PTR(error); > > } > > > > > > -- > > Balbir > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-15 2:23 ` KAMEZAWA Hiroyuki @ 2009-06-15 2:44 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2009-06-15 2:44 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro KAMEZAWA Hiroyuki wrote: > On Mon, 15 Jun 2009 11:18:17 +0900 > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > >> On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >>> Here is v4 of the patches, please review and comment >>> >>> Feature: Remove the overhead associated with the root cgroup >>> >>> From: Balbir Singh <balbir@linux.vnet.ibm.com> >>> >>> changelog v4 -> v3 >>> 1. Rebase to mmotm 9th june 2009 >>> 2. Remove PageCgroupRoot, we have account LRU flags to indicate that >>> we do only accounting and no reclaim. >> hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be >> used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks. >> >>> 3. pcg_default_flags has been used again, since PCGF_ROOT is gone, >>> we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list >> It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU >> outside of zone->lru_lock. >> >> IMHO, the most complicated case is a SwapCache which has been read ahead by >> a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be >> on page_vec and be drained to LRU asymmetrically with do_swap_page(). >> Well, yes it would be safe just because PCGF_ACCT_LRU would not be set >> if PCGF_USED has not been set, but I don't think it's a good idea to touch >> PCGF_ACCT_LRU outside of zone->lru_lock anyway. >> >> >> Doesn't a patch like below work for you ? >> Lightly tested under global memory pressure(w/o memcg's memory pressure) >> on a small machine(just a bit modified from then though). >> OK, so you like the older meaning and implementation, the code seems fine to me, I like the removal of list_empty() checks that you and Kame have proposed. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-15 2:18 ` Daisuke Nishimura 2009-06-15 2:23 ` KAMEZAWA Hiroyuki @ 2009-06-15 3:00 ` Balbir Singh 2009-06-15 3:09 ` Daisuke Nishimura 1 sibling, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-06-15 3:00 UTC (permalink / raw) To: Daisuke Nishimura Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro Daisuke Nishimura wrote: > pc->mem_cgroup = mem; > smp_wmb(); > - pc->flags = pcg_default_flags[ctype]; pc->flags needs to be reset here, otherwise we have the danger the carrying over older bits. I'll merge your changes and test. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-15 3:00 ` Balbir Singh @ 2009-06-15 3:09 ` Daisuke Nishimura 2009-06-15 3:22 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-15 3:09 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Daisuke Nishimura wrote: > > > pc->mem_cgroup = mem; > > smp_wmb(); > > - pc->flags = pcg_default_flags[ctype]; > > pc->flags needs to be reset here, otherwise we have the danger the carrying over > older bits. I'll merge your changes and test. > hmm, why ? I do in my patch: + switch (ctype) { + case MEM_CGROUP_CHARGE_TYPE_CACHE: + case MEM_CGROUP_CHARGE_TYPE_SHMEM: + SetPageCgroupCache(pc); + SetPageCgroupUsed(pc); + break; + case MEM_CGROUP_CHARGE_TYPE_MAPPED: + ClearPageCgroupCache(pc); + SetPageCgroupUsed(pc); + break; + default: + break; + } So, all the necessary flags are set and all the unnecessary ones are cleared, right ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-15 3:09 ` Daisuke Nishimura @ 2009-06-15 3:22 ` Balbir Singh 2009-06-15 3:46 ` Daisuke Nishimura 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-06-15 3:22 UTC (permalink / raw) To: Daisuke Nishimura Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro Daisuke Nishimura wrote: > On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> Daisuke Nishimura wrote: >> >>> pc->mem_cgroup = mem; >>> smp_wmb(); >>> - pc->flags = pcg_default_flags[ctype]; >> pc->flags needs to be reset here, otherwise we have the danger the carrying over >> older bits. I'll merge your changes and test. >> > hmm, why ? > > I do in my patch: > > + switch (ctype) { > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > + SetPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > + ClearPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + default: > + break; > + } > Yes, I did that in the older code, what I was suggesting was just an additional step to ensure that in the future if we add new flags, we don't end up with a long list of initializations and clearing or if we forget to clear pc->flags and reuse the page_cgroup, it might be a problem. My message was confusing, it should have been resetting the pc->flags will provide protection for any future addition of flags. I am testing your patch which is the modified version of v3 with your changes and have your signed-off-by in it as well as I post v5. Is that OK? -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-15 3:22 ` Balbir Singh @ 2009-06-15 3:46 ` Daisuke Nishimura 2009-06-15 4:22 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-15 3:46 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura On Mon, 15 Jun 2009 08:52:56 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Daisuke Nishimura wrote: > > On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > >> Daisuke Nishimura wrote: > >> > >>> pc->mem_cgroup = mem; > >>> smp_wmb(); > >>> - pc->flags = pcg_default_flags[ctype]; > >> pc->flags needs to be reset here, otherwise we have the danger the carrying over > >> older bits. I'll merge your changes and test. > >> > > hmm, why ? > > > > I do in my patch: > > > > + switch (ctype) { > > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > > + SetPageCgroupCache(pc); > > + SetPageCgroupUsed(pc); > > + break; > > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > > + ClearPageCgroupCache(pc); > > + SetPageCgroupUsed(pc); > > + break; > > + default: > > + break; > > + } > > > > Yes, I did that in the older code, what I was suggesting was just an additional > step to ensure that in the future if we add new flags, we don't end up with a > long list of initializations and clearing or if we forget to clear pc->flags and > reuse the page_cgroup, it might be a problem. My message was confusing, it > should have been resetting the pc->flags will provide protection for any future > addition of flags. > O.K. I see your point. But we shouldn't touch PCG_ACCT_LRU flag here. IIUC, that's why we abandon pcg_default_flags[]. Please take care of it. > I am testing your patch which is the modified version of v3 with your changes > and have your signed-off-by in it as well as I post v5. Is that OK? > Sure :) Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Low overhead patches for the memory cgroup controller (v4) 2009-06-15 3:46 ` Daisuke Nishimura @ 2009-06-15 4:22 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2009-06-15 4:22 UTC (permalink / raw) To: Daisuke Nishimura Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro Daisuke Nishimura wrote: > On Mon, 15 Jun 2009 08:52:56 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> Daisuke Nishimura wrote: >>> On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >>>> Daisuke Nishimura wrote: >>>> >>>>> pc->mem_cgroup = mem; >>>>> smp_wmb(); >>>>> - pc->flags = pcg_default_flags[ctype]; >>>> pc->flags needs to be reset here, otherwise we have the danger the carrying over >>>> older bits. I'll merge your changes and test. >>>> >>> hmm, why ? >>> >>> I do in my patch: >>> >>> + switch (ctype) { >>> + case MEM_CGROUP_CHARGE_TYPE_CACHE: >>> + case MEM_CGROUP_CHARGE_TYPE_SHMEM: >>> + SetPageCgroupCache(pc); >>> + SetPageCgroupUsed(pc); >>> + break; >>> + case MEM_CGROUP_CHARGE_TYPE_MAPPED: >>> + ClearPageCgroupCache(pc); >>> + SetPageCgroupUsed(pc); >>> + break; >>> + default: >>> + break; >>> + } >>> >> Yes, I did that in the older code, what I was suggesting was just an additional >> step to ensure that in the future if we add new flags, we don't end up with a >> long list of initializations and clearing or if we forget to clear pc->flags and >> reuse the page_cgroup, it might be a problem. My message was confusing, it >> should have been resetting the pc->flags will provide protection for any future >> addition of flags. >> > O.K. I see your point. > > But we shouldn't touch PCG_ACCT_LRU flag here. IIUC, that's why we abandon > pcg_default_flags[]. Please take care of it. > I am keeping the pc->flags removed as in the earlier patch, but something to keep in mind as we review further changes to the flags field. >> I am testing your patch which is the modified version of v3 with your changes >> and have your signed-off-by in it as well as I post v5. Is that OK? >> > Sure :) > Just sending it out, now, Thanks! -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-15 17:45 [RFC] Low overhead patches for the memory cgroup controller (v2) KAMEZAWA Hiroyuki 2009-05-15 18:16 ` Balbir Singh @ 2009-05-17 4:15 ` Balbir Singh 2009-06-01 4:25 ` Daisuke Nishimura 1 sibling, 1 reply; 30+ messages in thread From: Balbir Singh @ 2009-05-17 4:15 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > I think set/clear flag here adds race condtion....because pc->flags is > modfied by > pc->flags = pcg_dafault_flags[ctype] in commit_charge() > you have to modify above lines to be > > SetPageCgroupCache(pc) or some.. > ... > SetPageCgroupUsed(pc) > > Then, you can use set_bit() without lock_page_cgroup(). > (Currently, pc->flags is modified only under lock_page_cgroup(), so, > non atomic code is used.) > Here is the next version of the patch Feature: Remove the overhead associated with the root cgroup From: Balbir Singh <balbir@linux.vnet.ibm.com> This patch changes the memory cgroup and removes the overhead associated with accounting all pages in the root cgroup. As a side-effect, we can no longer set a memory hard limit in the root cgroup. A new flag is used to track page_cgroup associated with the root cgroup pages. A new flag to track whether the page has been accounted or not has been added as well. Flags are now set atomically for page_cgroup, pcg_default_flags is now obsolete, but I've not removed it yet. It provides some readability to help the code. Tests: 1. Tested lightly, previous versions showed good performance improvement 10%. NOTE: I haven't got the time right now to run oprofile and get detailed test results, since I am in the middle of travel. Please review the code for functional correctness and if you can test it even better. I would like to push this in, especially if the % performance difference I am seeing is reproducible elsewhere as well. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- include/linux/page_cgroup.h | 12 ++++++++++++ mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- mm/page_cgroup.c | 1 - 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 7339c7b..ebdae9a 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -26,6 +26,8 @@ enum { PCG_LOCK, /* page cgroup is locked */ PCG_CACHE, /* charged as cache */ PCG_USED, /* this object is in use. */ + PCG_ROOT, /* page belongs to root cgroup */ + PCG_ACCT, /* page has been accounted for */ }; #define TESTPCGFLAG(uname, lname) \ @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ /* Cache flag is set only once (at allocation) */ TESTPCGFLAG(Cache, CACHE) +SETPCGFLAG(Cache, CACHE) TESTPCGFLAG(Used, USED) CLEARPCGFLAG(Used, USED) +SETPCGFLAG(Used, USED) + +SETPCGFLAG(Root, ROOT) +CLEARPCGFLAG(Root, ROOT) +TESTPCGFLAG(Root, ROOT) + +SETPCGFLAG(Acct, ACCT) +CLEARPCGFLAG(Acct, ACCT) +TESTPCGFLAG(Acct, ACCT) static inline int page_cgroup_nid(struct page_cgroup *pc) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9712ef7..35415fc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -43,6 +43,7 @@ struct cgroup_subsys mem_cgroup_subsys __read_mostly; #define MEM_CGROUP_RECLAIM_RETRIES 5 +struct mem_cgroup *root_mem_cgroup __read_mostly; #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */ @@ -196,6 +197,10 @@ enum charge_type { #define PCGF_CACHE (1UL << PCG_CACHE) #define PCGF_USED (1UL << PCG_USED) #define PCGF_LOCK (1UL << PCG_LOCK) +/* Not used, but added here for completeness */ +#define PCGF_ROOT (1UL << PCG_ROOT) +#define PCGF_ACCT (1UL << PCG_ACCT) + static const unsigned long pcg_default_flags[NR_CHARGE_TYPE] = { PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) return; pc = lookup_page_cgroup(page); /* can happen while we handle swapcache. */ - if (list_empty(&pc->lru) || !pc->mem_cgroup) + if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) return; /* * We don't check PCG_USED bit. It's cleared when the "page" is finally @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) mz = page_cgroup_zoneinfo(pc); mem = pc->mem_cgroup; MEM_CGROUP_ZSTAT(mz, lru) -= 1; + ClearPageCgroupAcct(pc); + if (PageCgroupRoot(pc)) + return; list_del_init(&pc->lru); return; } @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) * For making pc->mem_cgroup visible, insert smp_rmb() here. */ smp_rmb(); - /* unused page is not rotated. */ - if (!PageCgroupUsed(pc)) + /* unused or root page is not rotated. */ + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) return; mz = page_cgroup_zoneinfo(pc); list_move(&pc->lru, &mz->lists[lru]); @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) mz = page_cgroup_zoneinfo(pc); MEM_CGROUP_ZSTAT(mz, lru) += 1; + SetPageCgroupAcct(pc); + if (PageCgroupRoot(pc)) + return; list_add(&pc->lru, &mz->lists[lru]); } @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, css_put(&mem->css); return; } + pc->mem_cgroup = mem; smp_wmb(); - pc->flags = pcg_default_flags[ctype]; + switch (ctype) { + case MEM_CGROUP_CHARGE_TYPE_CACHE: + case MEM_CGROUP_CHARGE_TYPE_SHMEM: + SetPageCgroupCache(pc); + SetPageCgroupUsed(pc); + break; + case MEM_CGROUP_CHARGE_TYPE_MAPPED: + SetPageCgroupUsed(pc); + break; + default: + break; + } + + if (mem == root_mem_cgroup) + SetPageCgroupRoot(pc); mem_cgroup_charge_statistics(mem, pc, true); @@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) mem_cgroup_charge_statistics(mem, pc, false); ClearPageCgroupUsed(pc); + if (mem == root_mem_cgroup) + ClearPageCgroupRoot(pc); /* * pc->mem_cgroup is not cleared here. It will be accessed when it's * freed from LRU. This is safe because uncharged page is expected not @@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, name = MEMFILE_ATTR(cft->private); switch (name) { case RES_LIMIT: + if (memcg == root_mem_cgroup) { /* Can't set limit on root */ + ret = -EINVAL; + break; + } /* This function does all necessary parse...reuse it */ ret = res_counter_memparse_write_strategy(buffer, &val); if (ret) @@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) if (cont->parent == NULL) { enable_swap_cgroup(); parent = NULL; + root_mem_cgroup = mem; } else { parent = mem_cgroup_from_cont(cont->parent); mem->use_hierarchy = parent->use_hierarchy; @@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) return &mem->css; free_out: __mem_cgroup_free(mem); + root_mem_cgroup = NULL; return ERR_PTR(error); } diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c index 09b73c5..6145ff6 100644 --- a/mm/page_cgroup.c +++ b/mm/page_cgroup.c @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) #endif - #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP static DEFINE_MUTEX(swap_cgroup_mutex); -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-05-17 4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh @ 2009-06-01 4:25 ` Daisuke Nishimura 2009-06-01 5:01 ` Daisuke Nishimura 2009-06-01 5:49 ` Balbir Singh 0 siblings, 2 replies; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-01 4:25 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura I'm sorry for my very late reply. I've been working on the stale swap cache problem for a long time as you know :) On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > > > I think set/clear flag here adds race condtion....because pc->flags is > > modfied by > > pc->flags = pcg_dafault_flags[ctype] in commit_charge() > > you have to modify above lines to be > > > > SetPageCgroupCache(pc) or some.. > > ... > > SetPageCgroupUsed(pc) > > > > Then, you can use set_bit() without lock_page_cgroup(). > > (Currently, pc->flags is modified only under lock_page_cgroup(), so, > > non atomic code is used.) > > > > Here is the next version of the patch > > > Feature: Remove the overhead associated with the root cgroup > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > This patch changes the memory cgroup and removes the overhead associated > with accounting all pages in the root cgroup. As a side-effect, we can > no longer set a memory hard limit in the root cgroup. > I agree to this idea itself. > A new flag is used to track page_cgroup associated with the root cgroup > pages. A new flag to track whether the page has been accounted or not > has been added as well. Flags are now set atomically for page_cgroup, > pcg_default_flags is now obsolete, but I've not removed it yet. It > provides some readability to help the code. > > Tests: > 1. Tested lightly, previous versions showed good performance improvement 10%. > You should test current version :) And I think you should test this patch under global memory pressure too to check whether it doesn't cause bug or under/over flow of something, etc. memcg's LRU handling about SwapCache is different from usual one. > NOTE: > I haven't got the time right now to run oprofile and get detailed test results, > since I am in the middle of travel. > > Please review the code for functional correctness and if you can test > it even better. I would like to push this in, especially if the % > performance difference I am seeing is reproducible elsewhere as well. > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > > include/linux/page_cgroup.h | 12 ++++++++++++ > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- > mm/page_cgroup.c | 1 - > 3 files changed, 50 insertions(+), 5 deletions(-) > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > index 7339c7b..ebdae9a 100644 > --- a/include/linux/page_cgroup.h > +++ b/include/linux/page_cgroup.h > @@ -26,6 +26,8 @@ enum { > PCG_LOCK, /* page cgroup is locked */ > PCG_CACHE, /* charged as cache */ > PCG_USED, /* this object is in use. */ > + PCG_ROOT, /* page belongs to root cgroup */ > + PCG_ACCT, /* page has been accounted for */ > }; > Those new flags are protected by zone->lru_lock, right ? If so, please add some comments. And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ? > #define TESTPCGFLAG(uname, lname) \ > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ > > /* Cache flag is set only once (at allocation) */ > TESTPCGFLAG(Cache, CACHE) > +SETPCGFLAG(Cache, CACHE) > > TESTPCGFLAG(Used, USED) > CLEARPCGFLAG(Used, USED) > +SETPCGFLAG(Used, USED) > + > +SETPCGFLAG(Root, ROOT) > +CLEARPCGFLAG(Root, ROOT) > +TESTPCGFLAG(Root, ROOT) > + > +SETPCGFLAG(Acct, ACCT) > +CLEARPCGFLAG(Acct, ACCT) > +TESTPCGFLAG(Acct, ACCT) > > static inline int page_cgroup_nid(struct page_cgroup *pc) > { > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 9712ef7..35415fc 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -43,6 +43,7 @@ > > struct cgroup_subsys mem_cgroup_subsys __read_mostly; > #define MEM_CGROUP_RECLAIM_RETRIES 5 > +struct mem_cgroup *root_mem_cgroup __read_mostly; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */ > @@ -196,6 +197,10 @@ enum charge_type { > #define PCGF_CACHE (1UL << PCG_CACHE) > #define PCGF_USED (1UL << PCG_USED) > #define PCGF_LOCK (1UL << PCG_LOCK) > +/* Not used, but added here for completeness */ > +#define PCGF_ROOT (1UL << PCG_ROOT) > +#define PCGF_ACCT (1UL << PCG_ACCT) > + > static const unsigned long > pcg_default_flags[NR_CHARGE_TYPE] = { > PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */ > @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > return; > pc = lookup_page_cgroup(page); > /* can happen while we handle swapcache. */ > - if (list_empty(&pc->lru) || !pc->mem_cgroup) > + if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup) > return; > /* > * We don't check PCG_USED bit. It's cleared when the "page" is finally > @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru) > mz = page_cgroup_zoneinfo(pc); > mem = pc->mem_cgroup; > MEM_CGROUP_ZSTAT(mz, lru) -= 1; > + ClearPageCgroupAcct(pc); > + if (PageCgroupRoot(pc)) > + return; > list_del_init(&pc->lru); > return; > } > @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru) > * For making pc->mem_cgroup visible, insert smp_rmb() here. > */ > smp_rmb(); > - /* unused page is not rotated. */ > - if (!PageCgroupUsed(pc)) > + /* unused or root page is not rotated. */ > + if (!PageCgroupUsed(pc) || PageCgroupRoot(pc)) > return; > mz = page_cgroup_zoneinfo(pc); > list_move(&pc->lru, &mz->lists[lru]); > @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru) > > mz = page_cgroup_zoneinfo(pc); > MEM_CGROUP_ZSTAT(mz, lru) += 1; > + SetPageCgroupAcct(pc); > + if (PageCgroupRoot(pc)) > + return; > list_add(&pc->lru, &mz->lists[lru]); > } > > @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > css_put(&mem->css); > return; > } > + > pc->mem_cgroup = mem; > smp_wmb(); > - pc->flags = pcg_default_flags[ctype]; > + switch (ctype) { > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > + SetPageCgroupCache(pc); > + SetPageCgroupUsed(pc); > + break; > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > + SetPageCgroupUsed(pc); > + break; > + default: > + break; > + } > + > + if (mem == root_mem_cgroup) > + SetPageCgroupRoot(pc); > > mem_cgroup_charge_statistics(mem, pc, true); > Shouldn't we set PCG_LOCK ? unlock_page_cgroup() will be called after this. Moreover, IIUC, pc->flags is not cleared at page free/alloc, so if a page is reused, pc->flags has the old value. PCG_CACHE flag, at least, is used by the decision in mem_cgroup_charge_statistics(). > @@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > mem_cgroup_charge_statistics(mem, pc, false); > > ClearPageCgroupUsed(pc); > + if (mem == root_mem_cgroup) > + ClearPageCgroupRoot(pc); > /* > * pc->mem_cgroup is not cleared here. It will be accessed when it's > * freed from LRU. This is safe because uncharged page is expected not > @@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, > name = MEMFILE_ATTR(cft->private); > switch (name) { > case RES_LIMIT: > + if (memcg == root_mem_cgroup) { /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } > /* This function does all necessary parse...reuse it */ > ret = res_counter_memparse_write_strategy(buffer, &val); > if (ret) It's a nitpick, I prefer not to show *.limit_in_bytes if we cannot write to them. Thanks, Daisuke Nishimura. > @@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > if (cont->parent == NULL) { > enable_swap_cgroup(); > parent = NULL; > + root_mem_cgroup = mem; > } else { > parent = mem_cgroup_from_cont(cont->parent); > mem->use_hierarchy = parent->use_hierarchy; > @@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > return &mem->css; > free_out: > __mem_cgroup_free(mem); > + root_mem_cgroup = NULL; > return ERR_PTR(error); > } > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c > index 09b73c5..6145ff6 100644 > --- a/mm/page_cgroup.c > +++ b/mm/page_cgroup.c > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) > > #endif > > - > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > static DEFINE_MUTEX(swap_cgroup_mutex); > > > -- > Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-06-01 4:25 ` Daisuke Nishimura @ 2009-06-01 5:01 ` Daisuke Nishimura 2009-06-01 5:49 ` Balbir Singh 1 sibling, 0 replies; 30+ messages in thread From: Daisuke Nishimura @ 2009-06-01 5:01 UTC (permalink / raw) To: balbir Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro, Daisuke Nishimura > > @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, > > css_put(&mem->css); > > return; > > } > > + > > pc->mem_cgroup = mem; > > smp_wmb(); > > - pc->flags = pcg_default_flags[ctype]; > > + switch (ctype) { > > + case MEM_CGROUP_CHARGE_TYPE_CACHE: > > + case MEM_CGROUP_CHARGE_TYPE_SHMEM: > > + SetPageCgroupCache(pc); > > + SetPageCgroupUsed(pc); > > + break; > > + case MEM_CGROUP_CHARGE_TYPE_MAPPED: > > + SetPageCgroupUsed(pc); > > + break; > > + default: > > + break; > > + } > > + > > + if (mem == root_mem_cgroup) > > + SetPageCgroupRoot(pc); > > > > mem_cgroup_charge_statistics(mem, pc, true); > > > Shouldn't we set PCG_LOCK ? > unlock_page_cgroup() will be called after this. > Ah, lock_page_cgroup() has already set it. please ignore this comment. Sorry for noise. Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2) 2009-06-01 4:25 ` Daisuke Nishimura 2009-06-01 5:01 ` Daisuke Nishimura @ 2009-06-01 5:49 ` Balbir Singh 1 sibling, 0 replies; 30+ messages in thread From: Balbir Singh @ 2009-06-01 5:49 UTC (permalink / raw) To: Daisuke Nishimura Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton, lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro * nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-01 13:25:05]: > I'm sorry for my very late reply. > > I've been working on the stale swap cache problem for a long time as you know :) > > On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]: > > > > > I think set/clear flag here adds race condtion....because pc->flags is > > > modfied by > > > pc->flags = pcg_dafault_flags[ctype] in commit_charge() > > > you have to modify above lines to be > > > > > > SetPageCgroupCache(pc) or some.. > > > ... > > > SetPageCgroupUsed(pc) > > > > > > Then, you can use set_bit() without lock_page_cgroup(). > > > (Currently, pc->flags is modified only under lock_page_cgroup(), so, > > > non atomic code is used.) > > > > > > > Here is the next version of the patch > > > > > > Feature: Remove the overhead associated with the root cgroup > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > This patch changes the memory cgroup and removes the overhead associated > > with accounting all pages in the root cgroup. As a side-effect, we can > > no longer set a memory hard limit in the root cgroup. > > > I agree to this idea itself. > Thanks! > > A new flag is used to track page_cgroup associated with the root cgroup > > pages. A new flag to track whether the page has been accounted or not > > has been added as well. Flags are now set atomically for page_cgroup, > > pcg_default_flags is now obsolete, but I've not removed it yet. It > > provides some readability to help the code. > > > > Tests: > > 1. Tested lightly, previous versions showed good performance improvement 10%. > > > You should test current version :) > And I think you should test this patch under global memory pressure too > to check whether it doesn't cause bug or under/over flow of something, etc. > memcg's LRU handling about SwapCache is different from usual one. > OK, I've tested it using my stress tool, but I'll modify to add some of the things you've pointed out. > > NOTE: > > I haven't got the time right now to run oprofile and get detailed test results, > > since I am in the middle of travel. > > > > Please review the code for functional correctness and if you can test > > it even better. I would like to push this in, especially if the % > > performance difference I am seeing is reproducible elsewhere as well. > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > --- > > > > include/linux/page_cgroup.h | 12 ++++++++++++ > > mm/memcontrol.c | 42 ++++++++++++++++++++++++++++++++++++++---- > > mm/page_cgroup.c | 1 - > > 3 files changed, 50 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h > > index 7339c7b..ebdae9a 100644 > > --- a/include/linux/page_cgroup.h > > +++ b/include/linux/page_cgroup.h > > @@ -26,6 +26,8 @@ enum { > > PCG_LOCK, /* page cgroup is locked */ > > PCG_CACHE, /* charged as cache */ > > PCG_USED, /* this object is in use. */ > > + PCG_ROOT, /* page belongs to root cgroup */ > > + PCG_ACCT, /* page has been accounted for */ > > }; > > > Those new flags are protected by zone->lru_lock, right ? > If so, please add some comments. > And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ? > Nope.. the accounting is independent of charge/uncharge. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-06-15 4:22 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-15 17:45 [RFC] Low overhead patches for the memory cgroup controller (v2) KAMEZAWA Hiroyuki 2009-05-15 18:16 ` Balbir Singh 2009-05-18 10:11 ` KAMEZAWA Hiroyuki 2009-05-18 10:45 ` Balbir Singh 2009-05-18 16:01 ` KAMEZAWA Hiroyuki 2009-05-19 13:18 ` Balbir Singh 2009-05-31 23:51 ` Balbir Singh 2009-06-01 23:57 ` KAMEZAWA Hiroyuki 2009-06-05 5:31 ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh 2009-06-05 5:51 ` KAMEZAWA Hiroyuki 2009-06-05 9:33 ` Balbir Singh 2009-06-08 0:20 ` Daisuke Nishimura 2009-06-05 6:05 ` Daisuke Nishimura 2009-06-05 9:47 ` Balbir Singh 2009-06-08 0:03 ` Daisuke Nishimura 2009-06-05 6:43 ` Daisuke Nishimura 2009-06-14 18:37 ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh 2009-06-15 2:04 ` KAMEZAWA Hiroyuki 2009-06-15 2:18 ` Daisuke Nishimura 2009-06-15 2:23 ` KAMEZAWA Hiroyuki 2009-06-15 2:44 ` Balbir Singh 2009-06-15 3:00 ` Balbir Singh 2009-06-15 3:09 ` Daisuke Nishimura 2009-06-15 3:22 ` Balbir Singh 2009-06-15 3:46 ` Daisuke Nishimura 2009-06-15 4:22 ` Balbir Singh 2009-05-17 4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh 2009-06-01 4:25 ` Daisuke Nishimura 2009-06-01 5:01 ` Daisuke Nishimura 2009-06-01 5:49 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).