Re: [RFC] Low overhead patches for the memory cgroup controller (v2)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-15 17:45 KAMEZAWA Hiroyuki
  2009-05-15 18:16 ` Balbir Singh
  2009-05-17  4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh
  0 siblings, 2 replies; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-15 17:45 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton,
	KAMEZAWA Hiroyuki, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

Balbir Singh wrote:
> Feature: Remove the overhead associated with the root cgroup
>
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
>
> This patch changes the memory cgroup and removes the overhead associated
> with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> can
> no longer set a memory hard limit in the root cgroup.
>
> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well.
>
> Review comments higly appreciated
>
> Tests
>
> 1. Tested with allocate, touch and limit test case for a non-root cgroup
> 2. For the root cgroup tested performance impact with reaim
>
>
> 		+patch		mmtom-08-may-2009
> AIM9		1362.93		1338.17
> Dbase		17457.75	16021.58
> New Dbase	18070.18	16518.54
> Shared		9681.85		8882.11
> Compute		16197.79	15226.13
>
Hmm, at first impression, I can't convice the numbers...
Just avoiding list_add/del makes programs _10%_ faster ?
Could you show changes in cpu cache-miss late if you can ?
(And why Aim9 goes bad ?)
Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter
is not a problem here..

Could you show your .config and environment ?
When I trunst above numbers, it seems there is more optimization/
prefetch point in usual path

BTW, how the perfomance changes in children(not default) groups ?

> 3. Tested accounting in root cgroup to make sure it looks sane and
> correct.
>
Not sure but swap and shmem case should be checked carefully..


> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>
>  include/linux/page_cgroup.h |   10 ++++++++++
>  mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 36 insertions(+), 4 deletions(-)
>
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..8b85752 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT, /* page has been accounted for */
Reading codes, this PCG_ACCT should be PCG_AcctLRU.

>  };
>
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(Acct, ACCT)
> +CLEARPCGFLAG(Acct, ACCT)
> +TESTPCGFLAG(Acct, ACCT)
> +
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
>  	return page_to_nid(pc->page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9712ef7..18d2819 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account
> = 0 */
> @@ -196,6 +197,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }


> @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page,
> enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum
> lru_list lru)
>
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
I think set/clear flag here adds race condtion....because pc->flags is
modfied by
  pc->flags = pcg_dafault_flags[ctype] in commit_charge()
you have to modify above lines to be

  SetPageCgroupCache(pc) or some..
  ...
  SetPageCgroupUsed(pc)

Then, you can use set_bit() without lock_page_cgroup().
(Currently, pc->flags is modified only under lock_page_cgroup(), so,
 non atomic code is used.)

Regards,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-15 17:45 [RFC] Low overhead patches for the memory cgroup controller (v2) KAMEZAWA Hiroyuki
@ 2009-05-15 18:16 ` Balbir Singh
  2009-05-18 10:11   ` KAMEZAWA Hiroyuki
  2009-05-17  4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh
  1 sibling, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-05-15 18:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton,
	nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:

> Balbir Singh wrote:
> > Feature: Remove the overhead associated with the root cgroup
> >
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> >
> > This patch changes the memory cgroup and removes the overhead associated
> > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > can
> > no longer set a memory hard limit in the root cgroup.
> >
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well.
> >
> > Review comments higly appreciated
> >
> > Tests
> >
> > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > 2. For the root cgroup tested performance impact with reaim
> >
> >
> > 		+patch		mmtom-08-may-2009
> > AIM9		1362.93		1338.17
> > Dbase		17457.75	16021.58
> > New Dbase	18070.18	16518.54
> > Shared		9681.85		8882.11
> > Compute		16197.79	15226.13
> >
> Hmm, at first impression, I can't convice the numbers...
> Just avoiding list_add/del makes programs _10%_ faster ?
> Could you show changes in cpu cache-miss late if you can ?
> (And why Aim9 goes bad ?)

OK... I'll try but I am away on travel for 3 weeks :( you can try and run
this as well

> Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter
> is not a problem here..
> 
> Could you show your .config and environment ?
> When I trunst above numbers, it seems there is more optimization/
> prefetch point in usual path
> 
> BTW, how the perfomance changes in children(not default) groups ?
> 

I've not seen the impact of that. I'll try.


> > 3. Tested accounting in root cgroup to make sure it looks sane and
> > correct.
> >
> Not sure but swap and shmem case should be checked carefully..
> 
> 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> >
> >  include/linux/page_cgroup.h |   10 ++++++++++
> >  mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 36 insertions(+), 4 deletions(-)
> >
> >
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..8b85752 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT, /* page has been accounted for */
> Reading codes, this PCG_ACCT should be PCG_AcctLRU.

OK

> 
> >  };
> >
> >  #define TESTPCGFLAG(uname, lname)			\
> > @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
> >  TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> >
> > +SETPCGFLAG(Root, ROOT)
> > +CLEARPCGFLAG(Root, ROOT)
> > +TESTPCGFLAG(Root, ROOT)
> > +
> > +SETPCGFLAG(Acct, ACCT)
> > +CLEARPCGFLAG(Acct, ACCT)
> > +TESTPCGFLAG(Acct, ACCT)
> > +
> >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> >  {
> >  	return page_to_nid(pc->page);
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 9712ef7..18d2819 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -43,6 +43,7 @@
> >
> >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> >
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  /* Turned on only when memory cgroup is enabled && really_do_swap_account
> > = 0 */
> > @@ -196,6 +197,10 @@ enum charge_type {
> >  #define PCGF_CACHE	(1UL << PCG_CACHE)
> >  #define PCGF_USED	(1UL << PCG_USED)
> >  #define PCGF_LOCK	(1UL << PCG_LOCK)
> > +/* Not used, but added here for completeness */
> > +#define PCGF_ROOT	(1UL << PCG_ROOT)
> > +#define PCGF_ACCT	(1UL << PCG_ACCT)
> > +
> >  static const unsigned long
> >  pcg_default_flags[NR_CHARGE_TYPE] = {
> >  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> > @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> > lru_list lru)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> >  	/* can happen while we handle swapcache. */
> > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
> >  		return;
> >  	/*
> >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> > @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> > lru_list lru)
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	mem = pc->mem_cgroup;
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > +	ClearPageCgroupAcct(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_del_init(&pc->lru);
> >  	return;
> >  }
> 
> 
> > @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page,
> > enum lru_list lru)
> >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> >  	 */
> >  	smp_rmb();
> > -	/* unused page is not rotated. */
> > -	if (!PageCgroupUsed(pc))
> > +	/* unused or root page is not rotated. */
> > +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum
> > lru_list lru)
> >
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > +	SetPageCgroupAcct(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> I think set/clear flag here adds race condtion....because pc->flags is
> modfied by
>   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> you have to modify above lines to be
> 
>   SetPageCgroupCache(pc) or some..
>   ...
>   SetPageCgroupUsed(pc)

Good Point

> 
> Then, you can use set_bit() without lock_page_cgroup().
> (Currently, pc->flags is modified only under lock_page_cgroup(), so,
>  non atomic code is used.)

OK.. I wonder if we can say, the _ACCT and _ROOT flags under
zone->lru_lock. I have not seen the locks held under commit_charge
fully, but we could potentially do that. Need some more thinking.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-15 18:16 ` Balbir Singh
@ 2009-05-18 10:11   ` KAMEZAWA Hiroyuki
  2009-05-18 10:45     ` Balbir Singh
  2009-05-31 23:51     ` Balbir Singh
  0 siblings, 2 replies; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-18 10:11 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton,
	nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

On Fri, 15 May 2009 23:46:39 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> 
> > Balbir Singh wrote:
> > > Feature: Remove the overhead associated with the root cgroup
> > >
> > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > >
> > > This patch changes the memory cgroup and removes the overhead associated
> > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > can
> > > no longer set a memory hard limit in the root cgroup.
> > >
> > > A new flag is used to track page_cgroup associated with the root cgroup
> > > pages. A new flag to track whether the page has been accounted or not
> > > has been added as well.
> > >
> > > Review comments higly appreciated
> > >
> > > Tests
> > >
> > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > 2. For the root cgroup tested performance impact with reaim
> > >
> > >
> > > 		+patch		mmtom-08-may-2009
> > > AIM9		1362.93		1338.17
> > > Dbase		17457.75	16021.58
> > > New Dbase	18070.18	16518.54
> > > Shared		9681.85		8882.11
> > > Compute		16197.79	15226.13
> > >
> > Hmm, at first impression, I can't convice the numbers...
> > Just avoiding list_add/del makes programs _10%_ faster ?
> > Could you show changes in cpu cache-miss late if you can ?
> > (And why Aim9 goes bad ?)
> 
> OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> this as well
> 
tested aim7 with some config.

CPU: Xeon 3.1GHz/4Core x2 (8cpu)
Memory: 32G
HDD: Usual? Scsi disk (just 1 disk)
(try_to_free_pages() etc...will never be called.)

Multiuser config. #of tasks 1100 (near to peak on my host)

10runs.
rc6mm1 score(Jobs/min)
44009.1 44844.5 44691.1 43981.9 44992.6
44544.9 44179.1 44283.0 44442.9 45033.8  average=44500

+patch
44656.8 44270.8 44706.7 44106.1 44467.6
44585.3 44167.0 44756.7 44853.9 44249.4  average=44482

Dbase config. #of tasks 25
rc6mm1 score (jobs/min)
11022.7 11018.9 11037.9 11003.8 11087.5 
11145.2 11133.6 11068.3 11091.3 11106.6 average=11071

+patch
10888.0 10973.7 10913.9 11000.0 10984.9
10996.2 10969.9 10921.3 10921.3 11053.1 average=10962

Hmm, 1% improvement ?
(I think this is reasonable score of the effect of this patch)

Anyway, I'm afraid of difference between mine and your kernel config.
plz enjoy your travel for now :)

Thanks,
-Kame












--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 10:11   ` KAMEZAWA Hiroyuki
@ 2009-05-18 10:45     ` Balbir Singh
  2009-05-18 16:01       ` KAMEZAWA Hiroyuki
  2009-05-31 23:51     ` Balbir Singh
  1 sibling, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-05-18 10:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton,
	nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]:

> On Fri, 15 May 2009 23:46:39 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > Balbir Singh wrote:
> > > > Feature: Remove the overhead associated with the root cgroup
> > > >
> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > >
> > > > This patch changes the memory cgroup and removes the overhead associated
> > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > can
> > > > no longer set a memory hard limit in the root cgroup.
> > > >
> > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > pages. A new flag to track whether the page has been accounted or not
> > > > has been added as well.
> > > >
> > > > Review comments higly appreciated
> > > >
> > > > Tests
> > > >
> > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > 2. For the root cgroup tested performance impact with reaim
> > > >
> > > >
> > > > 		+patch		mmtom-08-may-2009
> > > > AIM9		1362.93		1338.17
> > > > Dbase		17457.75	16021.58
> > > > New Dbase	18070.18	16518.54
> > > > Shared		9681.85		8882.11
> > > > Compute		16197.79	15226.13
> > > >
> > > Hmm, at first impression, I can't convice the numbers...
> > > Just avoiding list_add/del makes programs _10%_ faster ?
> > > Could you show changes in cpu cache-miss late if you can ?
> > > (And why Aim9 goes bad ?)
> > 
> > OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> > this as well
> > 
> tested aim7 with some config.
> 
> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> Memory: 32G
> HDD: Usual? Scsi disk (just 1 disk)
> (try_to_free_pages() etc...will never be called.)
> 
> Multiuser config. #of tasks 1100 (near to peak on my host)
> 
> 10runs.
> rc6mm1 score(Jobs/min)
> 44009.1 44844.5 44691.1 43981.9 44992.6
> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> 
> +patch
> 44656.8 44270.8 44706.7 44106.1 44467.6
> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> 
> Dbase config. #of tasks 25
> rc6mm1 score (jobs/min)
> 11022.7 11018.9 11037.9 11003.8 11087.5 
> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> 
> +patch
> 10888.0 10973.7 10913.9 11000.0 10984.9
> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> 
> Hmm, 1% improvement ?
> (I think this is reasonable score of the effect of this patch)
>

Thanks for the test, I have a 4 CPU system and I create 80 users,
larger config shows larger difference at my end. I think even 1% is
quite reasonable as you mentioned. If the patch looks fine, should we
ask for larger testing by Andrew?
 
> Anyway, I'm afraid of difference between mine and your kernel config.
> plz enjoy your travel for now :)

Sorry, I did not send you my .config, why do you think .config makes a
difference? I think loading AIM makes the difference and I also made
one other change to the aim tests. I run with "sync" linked to
/bin/true and use tmpfs for temporary partition and 20*numnber of cpus
for number of users.

If required, I can still send out my .config to you.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 10:45     ` Balbir Singh
@ 2009-05-18 16:01       ` KAMEZAWA Hiroyuki
  2009-05-19 13:18         ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-18 16:01 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Andrew Morton,
	nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

Balbir Singh wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18
> 19:11:07]:
>
>> On Fri, 15 May 2009 23:46:39 +0530
>> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>
>> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16
>> 02:45:03]:
>> >
>> > > Balbir Singh wrote:
>> > > > Feature: Remove the overhead associated with the root cgroup
>> > > >
>> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
>> > > >
>> > > > This patch changes the memory cgroup and removes the overhead
>> associated
>> > > > with LRU maintenance of all pages in the root cgroup. As a
>> side-effect, we
>> > > > can
>> > > > no longer set a memory hard limit in the root cgroup.
>> > > >
>> > > > A new flag is used to track page_cgroup associated with the root
>> cgroup
>> > > > pages. A new flag to track whether the page has been accounted or
>> not
>> > > > has been added as well.
>> > > >
>> > > > Review comments higly appreciated
>> > > >
>> > > > Tests
>> > > >
>> > > > 1. Tested with allocate, touch and limit test case for a non-root
>> cgroup
>> > > > 2. For the root cgroup tested performance impact with reaim
>> > > >
>> > > >
>> > > > 		+patch		mmtom-08-may-2009
>> > > > AIM9		1362.93		1338.17
>> > > > Dbase		17457.75	16021.58
>> > > > New Dbase	18070.18	16518.54
>> > > > Shared		9681.85		8882.11
>> > > > Compute		16197.79	15226.13
>> > > >
>> > > Hmm, at first impression, I can't convice the numbers...
>> > > Just avoiding list_add/del makes programs _10%_ faster ?
>> > > Could you show changes in cpu cache-miss late if you can ?
>> > > (And why Aim9 goes bad ?)
>> >
>> > OK... I'll try but I am away on travel for 3 weeks :( you can try and
>> run
>> > this as well
>> >
>> tested aim7 with some config.
>>
>> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
>> Memory: 32G
>> HDD: Usual? Scsi disk (just 1 disk)
>> (try_to_free_pages() etc...will never be called.)
>>
>> Multiuser config. #of tasks 1100 (near to peak on my host)
>>
>> 10runs.
>> rc6mm1 score(Jobs/min)
>> 44009.1 44844.5 44691.1 43981.9 44992.6
>> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
>>
>> +patch
>> 44656.8 44270.8 44706.7 44106.1 44467.6
>> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
>>
>> Dbase config. #of tasks 25
>> rc6mm1 score (jobs/min)
>> 11022.7 11018.9 11037.9 11003.8 11087.5
>> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
>>
>> +patch
>> 10888.0 10973.7 10913.9 11000.0 10984.9
>> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
>>
>> Hmm, 1% improvement ?
>> (I think this is reasonable score of the effect of this patch)
>>
>
> Thanks for the test, I have a 4 CPU system and I create 80 users,
> larger config shows larger difference at my end.
Sorry, above Dbase test was on 54 threads. I'll try 20*8=160 threads.

> I think even 1% is
> quite reasonable as you mentioned. If the patch looks fine, should we
> ask for larger testing by Andrew?
>
Hmm, as you like. My interest is bugfix for swap leaking now.
Because this change adds big special case, we need much tests, anyway.
And please show _environment_ where benchmarks run.
BTW, I wonder whetere we can have more improvements in this special case...

>> Anyway, I'm afraid of difference between mine and your kernel config.
>> plz enjoy your travel for now :)
>
> Sorry, I did not send you my .config, why do you think .config makes a
> difference?
I wanted to know what kind of DEBUG/TRACE config is on. and some others.

> I think loading AIM makes the difference and I also made
> one other change to the aim tests. I run with "sync" linked to
> /bin/true and use tmpfs for temporary partition and 20*numnber of cpus
> for number of users.
>
Is it usual method at using AIM ? (Sorry, I'm not sure).
It seems to break AIM7's purpose of "measuring typical workload"...

> If required, I can still send out my .config to you.
>
If you can, plz. (just for my interest ;)

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 16:01       ` KAMEZAWA Hiroyuki
@ 2009-05-19 13:18         ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2009-05-19 13:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton,
	nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

[-- Attachment #1: Type: text/plain, Size: 4556 bytes --]

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-19 01:01:00]:

> Balbir Singh wrote:
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18
> > 19:11:07]:
> >
> >> On Fri, 15 May 2009 23:46:39 +0530
> >> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >>
> >> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16
> >> 02:45:03]:
> >> >
> >> > > Balbir Singh wrote:
> >> > > > Feature: Remove the overhead associated with the root cgroup
> >> > > >
> >> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> >> > > >
> >> > > > This patch changes the memory cgroup and removes the overhead
> >> associated
> >> > > > with LRU maintenance of all pages in the root cgroup. As a
> >> side-effect, we
> >> > > > can
> >> > > > no longer set a memory hard limit in the root cgroup.
> >> > > >
> >> > > > A new flag is used to track page_cgroup associated with the root
> >> cgroup
> >> > > > pages. A new flag to track whether the page has been accounted or
> >> not
> >> > > > has been added as well.
> >> > > >
> >> > > > Review comments higly appreciated
> >> > > >
> >> > > > Tests
> >> > > >
> >> > > > 1. Tested with allocate, touch and limit test case for a non-root
> >> cgroup
> >> > > > 2. For the root cgroup tested performance impact with reaim
> >> > > >
> >> > > >
> >> > > > 		+patch		mmtom-08-may-2009
> >> > > > AIM9		1362.93		1338.17
> >> > > > Dbase		17457.75	16021.58
> >> > > > New Dbase	18070.18	16518.54
> >> > > > Shared		9681.85		8882.11
> >> > > > Compute		16197.79	15226.13
> >> > > >
> >> > > Hmm, at first impression, I can't convice the numbers...
> >> > > Just avoiding list_add/del makes programs _10%_ faster ?
> >> > > Could you show changes in cpu cache-miss late if you can ?
> >> > > (And why Aim9 goes bad ?)
> >> >
> >> > OK... I'll try but I am away on travel for 3 weeks :( you can try and
> >> run
> >> > this as well
> >> >
> >> tested aim7 with some config.
> >>
> >> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> >> Memory: 32G
> >> HDD: Usual? Scsi disk (just 1 disk)
> >> (try_to_free_pages() etc...will never be called.)
> >>
> >> Multiuser config. #of tasks 1100 (near to peak on my host)
> >>
> >> 10runs.
> >> rc6mm1 score(Jobs/min)
> >> 44009.1 44844.5 44691.1 43981.9 44992.6
> >> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> >>
> >> +patch
> >> 44656.8 44270.8 44706.7 44106.1 44467.6
> >> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> >>
> >> Dbase config. #of tasks 25
> >> rc6mm1 score (jobs/min)
> >> 11022.7 11018.9 11037.9 11003.8 11087.5
> >> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> >>
> >> +patch
> >> 10888.0 10973.7 10913.9 11000.0 10984.9
> >> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> >>
> >> Hmm, 1% improvement ?
> >> (I think this is reasonable score of the effect of this patch)
> >>
> >
> > Thanks for the test, I have a 4 CPU system and I create 80 users,
> > larger config shows larger difference at my end.
> Sorry, above Dbase test was on 54 threads. I'll try 20*8=160 threads.
>

cool! Thanks
 
> > I think even 1% is
> > quite reasonable as you mentioned. If the patch looks fine, should we
> > ask for larger testing by Andrew?
> >
> Hmm, as you like. My interest is bugfix for swap leaking now.

I've seen that too.. I think that has been going on for long and I am
afraid it is hurting features like soft limit, but bug fixing is
important. Hopefully we'll have a good solution soon.

> Because this change adds big special case, we need much tests, anyway.
> And please show _environment_ where benchmarks run.
> BTW, I wonder whetere we can have more improvements in this special case...
> 
> >> Anyway, I'm afraid of difference between mine and your kernel config.
> >> plz enjoy your travel for now :)
> >
> > Sorry, I did not send you my .config, why do you think .config makes a
> > difference?
> I wanted to know what kind of DEBUG/TRACE config is on. and some others.
> 
> > I think loading AIM makes the difference and I also made
> > one other change to the aim tests. I run with "sync" linked to
> > /bin/true and use tmpfs for temporary partition and 20*numnber of cpus
> > for number of users.
> >
> Is it usual method at using AIM ? (Sorry, I'm not sure).
> It seems to break AIM7's purpose of "measuring typical workload"...
> 

No.. it is not.. but sync has a large overhead, so I use /bin/true. I
can try without it and report back.


> > If required, I can still send out my .config to you.
> >
> If you can, plz. (just for my interest ;)
>

Attached, please see 

-- 
	Balbir

[-- Attachment #2: config-2.6.30-rc4-mm1 --]
[-- Type: text/plain, Size: 54827 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30-rc4-mm1
# Wed May 13 17:51:31 2009
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
# CONFIG_CLASSIC_RCU is not set
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_USER_SCHED is not set
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_MM_OWNER=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB_ALLOCATOR is not set
# CONFIG_SLUB_ALLOCATOR is not set
CONFIG_SLQB_ALLOCATOR=y
CONFIG_SLQB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
CONFIG_OPROFILE=m
CONFIG_OPROFILE_IBS=y
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_API_DEBUG=y
# CONFIG_SLOW_WORK is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_STOP_MACHINE=y
CONFIG_UTRACE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_FREEZER=y

#
# Processor type and features
#
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_X2APIC=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
# CONFIG_X86_UV is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
CONFIG_MEMTEST=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_X86_DS=y
CONFIG_X86_PTRACE_BTS=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_AMD_IOMMU=y
# CONFIG_AMD_IOMMU_STATS is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_IOMMU_API=y
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=32
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
# CONFIG_X86_CPU_DEBUG is not set
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_NUMA=y
# CONFIG_K8_NUMA is not set
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=m
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
# CONFIG_EFI is not set
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_SCHED_HRTICK is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
# CONFIG_ACPI_SBS is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

#
# CPUFreq processor drivers
#
CONFIG_X86_ACPI_CPUFREQ=y
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_DMAR is not set
CONFIG_INTR_REMAP=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEBUG is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
# CONFIG_HT_IRQ is not set
# CONFIG_PCI_IOV is not set
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
# CONFIG_PCCARD is not set
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_IA32_EMULATION=y
CONFIG_IA32_AOUT=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
# CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_BEET is not set
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=y
CONFIG_IPV6_NDISC_NODETYPE=y
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_STP=m
CONFIG_BRIDGE=m
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_CLS_CGROUP=y
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set
# CONFIG_WIRELESS_OLD_REGULATORY is not set
# CONFIG_WIRELESS_EXT is not set
# CONFIG_LIB80211 is not set
# CONFIG_MAC80211 is not set
CONFIG_MAC80211_DEFAULT_PS_VALUE=0
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_BLK_DEV_XIP is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_VIRTIO_BLK is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_ISL29003 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
# CONFIG_EEPROM_MAX6875 is not set
# CONFIG_EEPROM_93CX6 is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_XFER_MODE=y
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_IDE_GD=y
CONFIG_IDE_GD_ATA=y
# CONFIG_IDE_GD_ATAPI is not set
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
CONFIG_BLK_DEV_IDEACPI=y
# CONFIG_IDE_TASK_IOCTL is not set
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_PLATFORM is not set
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_IDEPNP is not set
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_GENERIC is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_BLK_DEV_ATIIXP=y
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8172 is not set
# CONFIG_BLK_DEV_IT8213 is not set
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
CONFIG_BLK_DEV_PDC202XX_NEW=y
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_BLK_DEV_TC86C001 is not set
CONFIG_BLK_DEV_IDEDMA=y

#
# SCSI device support
#
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=4000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=m
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=m
CONFIG_ATA_PIIX=m
# CONFIG_SATA_MV is not set
CONFIG_SATA_NV=m
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SX4 is not set
CONFIG_SATA_SIL=m
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_ULI is not set
CONFIG_SATA_VIA=m
# CONFIG_SATA_VITESSE is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_PATA_ACPI is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
# CONFIG_MD_RAID10 is not set
CONFIG_MD_RAID456=m
CONFIG_MD_RAID6_PQ=m
CONFIG_MD_MULTIPATH=m
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_MIRROR is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=m
# CONFIG_FUSION_LOGGING is not set

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
# CONFIG_FIREWIRE is not set
# CONFIG_IEEE1394 is not set
# CONFIG_I2O is not set
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_COMPAT_NET_DEV_OPS=y
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
CONFIG_EQUALIZER=m
CONFIG_TUN=y
CONFIG_VETH=m
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
# CONFIG_TYPHOON is not set
# CONFIG_ETHOC is not set
# CONFIG_DNET is not set
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
CONFIG_TULIP=y
# CONFIG_TULIP_MWI is not set
# CONFIG_TULIP_MMIO is not set
# CONFIG_TULIP_NAPI is not set
# CONFIG_DE4X5 is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_DM9102 is not set
# CONFIG_ULI526X is not set
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
CONFIG_AMD8111_ETH=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_B44=y
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=y
# CONFIG_FORCEDETH_NAPI is not set
CONFIG_E100=y
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
CONFIG_8139CP=y
CONFIG_8139TOO=y
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_VIA_RHINE is not set
# CONFIG_SC92031 is not set
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
CONFIG_E1000=y
CONFIG_E1000E=y
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_TIGON3=y
CONFIG_BNX2=y
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
CONFIG_NETDEV_10000=y
# CONFIG_CHELSIO_T1 is not set
CONFIG_CHELSIO_T3_DEPENDS=y
# CONFIG_CHELSIO_T3 is not set
# CONFIG_ENIC is not set
# CONFIG_IXGBE is not set
# CONFIG_IXGB is not set
CONFIG_S2IO=m
# CONFIG_VXGE is not set
# CONFIG_MYRI10GE is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_NIU is not set
# CONFIG_MLX4_EN is not set
# CONFIG_MLX4_CORE is not set
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
# CONFIG_QLGE is not set
# CONFIG_SFC is not set
# CONFIG_BE2NET is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=y
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_VIRTIO_NET=m
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=m
# CONFIG_INPUT_POLLDEV is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
CONFIG_DEVKMEM=y
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=y
CONFIG_HW_RANDOM_AMD=y
# CONFIG_HW_RANDOM_VIRTIO is not set
# CONFIG_NVRAM is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
# CONFIG_PC8736x_GPIO is not set
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=256
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HANGCHECK_TIMER is not set
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_NSC is not set
# CONFIG_TCG_ATMEL is not set
# CONFIG_TCG_INFINEON is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
# CONFIG_I2C_VOODOO3 is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_STUB is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_PCF8575 is not set
# CONFIG_SENSORS_PCA9539 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set

#
# PPS support
#
# CONFIG_PPS is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_BATTERY_BQ27x00 is not set
CONFIG_HWMON=m
# CONFIG_HWMON_VID is not set
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_FSCPOS is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
CONFIG_SENSORS_CORETEMP=m
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
# CONFIG_VIDEO_DEV is not set
# CONFIG_DVB_CORE is not set
# CONFIG_VIDEO_MEDIA is not set

#
# Multimedia drivers
#
CONFIG_DAB=y
# CONFIG_USB_DABUSB is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
# CONFIG_DRM is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=y
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_VESA is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_LE80578 is not set
CONFIG_FB_INTEL=y
# CONFIG_FB_INTEL_DEBUG is not set
CONFIG_FB_INTEL_I2C=y
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=256
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set
CONFIG_LOGO=y
CONFIG_LOGO_LINUX_MONO=y
CONFIG_LOGO_LINUX_VGA16=y
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
# CONFIG_SND is not set
CONFIG_SOUND_PRIME=y
# CONFIG_SOUND_OSS is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_HID_DEBUG=y
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=m
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
CONFIG_HID_APPLE=m
CONFIG_HID_BELKIN=m
CONFIG_HID_CHERRY=m
CONFIG_HID_CHICONY=m
CONFIG_HID_CYPRESS=m
# CONFIG_DRAGONRISE_FF is not set
CONFIG_HID_EZKEY=m
CONFIG_HID_KYE=m
CONFIG_HID_GYRATION=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LOGITECH=m
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=m
CONFIG_HID_NTRIG=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
CONFIG_HID_PETALYNX=m
CONFIG_HID_SAMSUNG=m
CONFIG_HID_SONY=m
CONFIG_HID_SUNPLUS=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_TOPSEED=m
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_ZEROPLUS_FF is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=m
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICE_CLASS=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
CONFIG_USB_MON=m
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=m
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# Enable Host or Gadget support to see Inventra options
#

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_BERRY_CHARGE is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_EDAC is not set
CONFIG_RTC_LIB=m
CONFIG_RTC_CLASS=m

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_DS1685 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=m
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set

#
# on-CPU RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=m
# CONFIG_UIO_CIF is not set
# CONFIG_UIO_PDRV is not set
# CONFIG_UIO_PDRV_GENIRQ is not set
# CONFIG_UIO_SMX is not set
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
CONFIG_INTEL_MENLOW=m
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
# CONFIG_EXT2_FS_SECURITY is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
# CONFIG_EXT3_FS_SECURITY is not set
# CONFIG_EXT4_FS is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
# CONFIG_REISERFS_FS_SECURITY is not set
# CONFIG_REISER4_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_BTRFS_FS is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set
CONFIG_GENERIC_ACL=y

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
# CONFIG_CONFIGFS_FS is not set
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_NILFS2_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
# CONFIG_NFS_V4 is not set
CONFIG_ROOT_NFS=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
# CONFIG_NFSD_V4 is not set
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_SLQB_DEBUG=y
# CONFIG_SLQB_DEBUG_ON is not set
# CONFIG_SLQB_SYSFS is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_DEBUG_SYNCHRO_TEST is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_LKDTM is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
# CONFIG_DEBUG_RODATA is not set
# CONFIG_DEBUG_NX_TEST is not set
# CONFIG_IOMMU_DEBUG is not set
CONFIG_X86_DS_SELFTEST=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
CONFIG_SECURITYFS=y
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_IMA is not set
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_GF128MUL is not set
# CONFIG_CRYPTO_NULL is not set
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
# CONFIG_CRYPTO_CBC is not set
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
# CONFIG_CRYPTO_CRC32C is not set
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
# CONFIG_CRYPTO_AES is not set
# CONFIG_CRYPTO_AES_X86_64 is not set
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_PADLOCK is not set
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
# CONFIG_KVM_TRACE is not set
CONFIG_VIRTIO=m
CONFIG_VIRTIO_RING=m
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_BALLOON=m
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC_T10DIF=y
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 10:11   ` KAMEZAWA Hiroyuki
  2009-05-18 10:45     ` Balbir Singh
@ 2009-05-31 23:51     ` Balbir Singh
  2009-06-01 23:57       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-05-31 23:51 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]:

> On Fri, 15 May 2009 23:46:39 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > Balbir Singh wrote:
> > > > Feature: Remove the overhead associated with the root cgroup
> > > >
> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > >
> > > > This patch changes the memory cgroup and removes the overhead associated
> > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > can
> > > > no longer set a memory hard limit in the root cgroup.
> > > >
> > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > pages. A new flag to track whether the page has been accounted or not
> > > > has been added as well.
> > > >
> > > > Review comments higly appreciated
> > > >
> > > > Tests
> > > >
> > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > 2. For the root cgroup tested performance impact with reaim
> > > >
> > > >
> > > > 		+patch		mmtom-08-may-2009
> > > > AIM9		1362.93		1338.17
> > > > Dbase		17457.75	16021.58
> > > > New Dbase	18070.18	16518.54
> > > > Shared		9681.85		8882.11
> > > > Compute		16197.79	15226.13
> > > >
> > > Hmm, at first impression, I can't convice the numbers...
> > > Just avoiding list_add/del makes programs _10%_ faster ?
> > > Could you show changes in cpu cache-miss late if you can ?
> > > (And why Aim9 goes bad ?)
> > 
> > OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> > this as well
> > 
> tested aim7 with some config.
> 
> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> Memory: 32G
> HDD: Usual? Scsi disk (just 1 disk)
> (try_to_free_pages() etc...will never be called.)
> 
> Multiuser config. #of tasks 1100 (near to peak on my host)
> 
> 10runs.
> rc6mm1 score(Jobs/min)
> 44009.1 44844.5 44691.1 43981.9 44992.6
> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> 
> +patch
> 44656.8 44270.8 44706.7 44106.1 44467.6
> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> 
> Dbase config. #of tasks 25
> rc6mm1 score (jobs/min)
> 11022.7 11018.9 11037.9 11003.8 11087.5 
> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> 
> +patch
> 10888.0 10973.7 10913.9 11000.0 10984.9
> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> 
> Hmm, 1% improvement ?
> (I think this is reasonable score of the effect of this patch)
> 
> Anyway, I'm afraid of difference between mine and your kernel config.
> plz enjoy your travel for now :)
>


Hi, Andrew,

Could you please pick up these patches for testing. Kamezawa-San, I am
assuming that you are OK with these patches going to -mm for testing?

Would you like me to resend the patches?

Balbir 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-31 23:51     ` Balbir Singh
@ 2009-06-01 23:57       ` KAMEZAWA Hiroyuki
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-01 23:57 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

On Mon, 1 Jun 2009 07:51:21 +0800
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]:
> 
> > On Fri, 15 May 2009 23:46:39 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > > 
> > > > Balbir Singh wrote:
> > > > > Feature: Remove the overhead associated with the root cgroup
> > > > >
> > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > > >
> > > > > This patch changes the memory cgroup and removes the overhead associated
> > > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > > can
> > > > > no longer set a memory hard limit in the root cgroup.
> > > > >
> > > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > > pages. A new flag to track whether the page has been accounted or not
> > > > > has been added as well.
> > > > >
> > > > > Review comments higly appreciated
> > > > >
> > > > > Tests
> > > > >
> > > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > > 2. For the root cgroup tested performance impact with reaim
> > > > >
> > > > >
> > > > > 		+patch		mmtom-08-may-2009
> > > > > AIM9		1362.93		1338.17
> > > > > Dbase		17457.75	16021.58
> > > > > New Dbase	18070.18	16518.54
> > > > > Shared		9681.85		8882.11
> > > > > Compute		16197.79	15226.13
> > > > >
> > > > Hmm, at first impression, I can't convice the numbers...
> > > > Just avoiding list_add/del makes programs _10%_ faster ?
> > > > Could you show changes in cpu cache-miss late if you can ?
> > > > (And why Aim9 goes bad ?)
> > > 
> > > OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> > > this as well
> > > 
> > tested aim7 with some config.
> > 
> > CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> > Memory: 32G
> > HDD: Usual? Scsi disk (just 1 disk)
> > (try_to_free_pages() etc...will never be called.)
> > 
> > Multiuser config. #of tasks 1100 (near to peak on my host)
> > 
> > 10runs.
> > rc6mm1 score(Jobs/min)
> > 44009.1 44844.5 44691.1 43981.9 44992.6
> > 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> > 
> > +patch
> > 44656.8 44270.8 44706.7 44106.1 44467.6
> > 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> > 
> > Dbase config. #of tasks 25
> > rc6mm1 score (jobs/min)
> > 11022.7 11018.9 11037.9 11003.8 11087.5 
> > 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> > 
> > +patch
> > 10888.0 10973.7 10913.9 11000.0 10984.9
> > 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> > 
> > Hmm, 1% improvement ?
> > (I think this is reasonable score of the effect of this patch)
> > 
> > Anyway, I'm afraid of difference between mine and your kernel config.
> > plz enjoy your travel for now :)
> >
> 
> 
> Hi, Andrew,
> 
> Could you please pick up these patches for testing. Kamezawa-San, I am
> assuming that you are OK with these patches going to -mm for testing?
> 
o.k. but..

> Would you like me to resend the patches?
> 
It's 2 weeks since original post. and several bug fixes are merged.
Could you post again ? (And it seems Nishimura-san posted some comments.)
Of course, I'll test again.

Thanks,
-Kame


> Balbir 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Low overhead patches for the memory cgroup controller (v3)
  2009-06-01 23:57       ` KAMEZAWA Hiroyuki
@ 2009-06-05  5:31         ` Balbir Singh
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
                             ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Balbir Singh @ 2009-06-05  5:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

Here is the new version of the patch with the RFC dropped. Andrew,
Kame, could you please take a look. I am just about to fly out to get
back home tomorrow, so there might be some silence, unless I get to
the next WiFi enabled airport.


From: Balbir Singh <balbir@linux.vnet.ibm.com>

Changelog v3 -> v2

1. Rebase to mmotm 2nd June 2009
2. Test with some of the test cases recommended by Daisuke-San

Changelog v2 -> v1
1. Fix and implement review comments.

Feature: Remove the overhead associated with the root cgroup

This patch changes the memory cgroup and removes the overhead associated
with accounting all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag is used to track page_cgroup associated with the root cgroup
pages. A new flag to track whether the page has been accounted or not
has been added as well. Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete, but I've not removed it yet. It
provides some readability to help the code.

Tests Results:

Obtained by

1. Using tmpfs for mounting filesystem
2. Changing sync to be /bin/true (so that sync is not the bottleneck)
3. Used -s #cpus*40 -e #cpus*40

Reaim
		withoutpatch	patch
AIM9		9532.48		9807.59
dbase		19344.60	19285.71
new_dbase	20101.65	20163.13
shared		11827.77	11886.65
compute		17317.38	17420.05

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |   12 ++++++++++++
 mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
 mm/page_cgroup.c            |    1 -
 3 files changed, 50 insertions(+), 5 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..41cc16c 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,8 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ROOT, /* page belongs to root cgroup */
+	PCG_ACCT_LRU, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
+SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
+SETPCGFLAG(Used, USED)
+
+SETPCGFLAG(Root, ROOT)
+CLEARPCGFLAG(Root, ROOT)
+TESTPCGFLAG(Root, ROOT)
+
+SETPCGFLAG(AcctLru, ACCT_LRU)
+CLEARPCGFLAG(AcctLru, ACCT_LRU)
+TESTPCGFLAG(AcctLru, ACCT_LRU)
 
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a83e039..9561d10 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
@@ -197,6 +198,10 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
+/* Not used, but added here for completeness */
+#define PCGF_ROOT	(1UL << PCG_ROOT)
+#define PCGF_ACCT	(1UL << PCG_ACCT)
+
 static const unsigned long
 pcg_default_flags[NR_CHARGE_TYPE] = {
 	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
@@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
 		return;
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
@@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 	mz = page_cgroup_zoneinfo(pc);
 	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	ClearPageCgroupAcctLru(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_del_init(&pc->lru);
 	return;
 }
@@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcctLru(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
-	pc->flags = pcg_default_flags[ctype];
+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}
+
+	if (mem == root_mem_cgroup)
+		SetPageCgroupRoot(pc);
 
 	mem_cgroup_charge_statistics(mem, pc, true);
 
@@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
+	if (mem == root_mem_cgroup)
+		ClearPageCgroupRoot(pc);
 	/*
 	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
@@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index ecc3918..4406a9c 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 
 #endif
 
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
 static DEFINE_MUTEX(swap_cgroup_mutex);

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
@ 2009-06-05  5:51           ` KAMEZAWA Hiroyuki
  2009-06-05  9:33             ` Balbir Singh
  2009-06-05  6:05           ` Daisuke Nishimura
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-05  5:51 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

On Fri, 5 Jun 2009 13:31:07 +0800
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Here is the new version of the patch with the RFC dropped. Andrew,
> Kame, could you please take a look. I am just about to fly out to get
> back home tomorrow, so there might be some silence, unless I get to
> the next WiFi enabled airport.
> 
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Fix and implement review comments.
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> pcg_default_flags is now obsolete, but I've not removed it yet. It
> provides some readability to help the code.
> 
> Tests Results:
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 

A few comments.


> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |   12 ++++++++++++
>  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..41cc16c 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(AcctLru, ACCT_LRU)
> +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> +TESTPCGFLAG(AcctLru, ACCT_LRU)
>  
I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through
the kernel.

>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a83e039..9561d10 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -197,6 +198,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */

Could you delete this default_flags ? This is of no use after this patch.


> @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
I wonder this condition is valid one or not..

IMHO, all check here should be

==
	if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup)
		return;
	mz = page_cgroup_zoneinfo(pc);
	mem = pc->mem_cgroup;
	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
	ClearPageCgroupAcctLru(pc);
	if (PageCgroupRoot(pc))
		return;
	VM_BUGON(list_empty(&pc->lru);
	list_del_init(&pc->lru);
	return;
==

I'm sorry if there is a case
   (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru))


>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
My concern here is there will be a racy moment that pc->flag shows
  PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup.

Then, The order of code here should be
==
	if (mem == root_mem_cgroup)
		SetPageCgroupRoot(pc);
	pc->mem_cgroup == mem;;
	smp_wmb();
	switch(type) {
	case....
	}
	// Used bit is set at last.
==

But I wonder it's better to use
==
static inline int page_cgroup_is_under_root(pc)
{
	pc->mem_cgroup == root_mem_cgroup;
}
==
I'm not sure why PageCgroupRoot() "bit" is necessary.
Could you clarify the benefit of Root flag ?



> @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> +	if (mem == root_mem_cgroup)
> +		ClearPageCgroupRoot(pc);
>  	/*
>  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
> @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index ecc3918..4406a9c 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
>  
>  #endif
>  
> -
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  
>  static DEFINE_MUTEX(swap_cgroup_mutex);
> 
Unnecessary diff here.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
@ 2009-06-05  9:33             ` Balbir Singh
  2009-06-08  0:20               ` Daisuke Nishimura
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-06-05  9:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-05 14:51:41]:

> On Fri, 5 Jun 2009 13:31:07 +0800
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > Here is the new version of the patch with the RFC dropped. Andrew,
> > Kame, could you please take a look. I am just about to fly out to get
> > back home tomorrow, so there might be some silence, unless I get to
> > the next WiFi enabled airport.
> > 
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > Changelog v3 -> v2
> > 
> > 1. Rebase to mmotm 2nd June 2009
> > 2. Test with some of the test cases recommended by Daisuke-San
> > 
> > Changelog v2 -> v1
> > 1. Fix and implement review comments.
> > 
> > Feature: Remove the overhead associated with the root cgroup
> > 
> > This patch changes the memory cgroup and removes the overhead associated
> > with accounting all pages in the root cgroup. As a side-effect, we can
> > no longer set a memory hard limit in the root cgroup.
> > 
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well. Flags are now set atomically for page_cgroup,
> > pcg_default_flags is now obsolete, but I've not removed it yet. It
> > provides some readability to help the code.
> > 
> > Tests Results:
> > 
> > Obtained by
> > 
> > 1. Using tmpfs for mounting filesystem
> > 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> > 3. Used -s #cpus*40 -e #cpus*40
> > 
> > Reaim
> > 		withoutpatch	patch
> > AIM9		9532.48		9807.59
> > dbase		19344.60	19285.71
> > new_dbase	20101.65	20163.13
> > shared		11827.77	11886.65
> > compute		17317.38	17420.05
> > 
> 
> A few comments.
> 
> 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> > 
> >  include/linux/page_cgroup.h |   12 ++++++++++++
> >  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 50 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..41cc16c 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT_LRU, /* page has been accounted for */
> >  };
> >  
> >  #define TESTPCGFLAG(uname, lname)			\
> > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
> >  
> >  /* Cache flag is set only once (at allocation) */
> >  TESTPCGFLAG(Cache, CACHE)
> > +SETPCGFLAG(Cache, CACHE)
> >  
> >  TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> > +SETPCGFLAG(Used, USED)
> > +
> > +SETPCGFLAG(Root, ROOT)
> > +CLEARPCGFLAG(Root, ROOT)
> > +TESTPCGFLAG(Root, ROOT)
> > +
> > +SETPCGFLAG(AcctLru, ACCT_LRU)
> > +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> > +TESTPCGFLAG(AcctLru, ACCT_LRU)
> >  
> I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through
> the kernel.

OK, I'll make that change. I agree LRU is better.

> 
> >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> >  {
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index a83e039..9561d10 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -43,6 +43,7 @@
> >  
> >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> >  
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> > @@ -197,6 +198,10 @@ enum charge_type {
> >  #define PCGF_CACHE	(1UL << PCG_CACHE)
> >  #define PCGF_USED	(1UL << PCG_USED)
> >  #define PCGF_LOCK	(1UL << PCG_LOCK)
> > +/* Not used, but added here for completeness */
> > +#define PCGF_ROOT	(1UL << PCG_ROOT)
> > +#define PCGF_ACCT	(1UL << PCG_ACCT)
> > +
> >  static const unsigned long
> >  pcg_default_flags[NR_CHARGE_TYPE] = {
> >  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> 
> Could you delete this default_flags ? This is of no use after this patch.
>

Yes, I mentioned in the comment that they are for readability of the
code. I can remove them if required.
 
> 
> > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> >  	/* can happen while we handle swapcache. */
> > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
> >  		return;
> I wonder this condition is valid one or not..
> 
> IMHO, all check here should be
> 
> ==
> 	if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup)
> 		return;
> 	mz = page_cgroup_zoneinfo(pc);
> 	mem = pc->mem_cgroup;
> 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> 	ClearPageCgroupAcctLru(pc);
> 	if (PageCgroupRoot(pc))
> 		return;
> 	VM_BUGON(list_empty(&pc->lru);
> 	list_del_init(&pc->lru);
> 	return;

We needed this check because

1. After PageCgroupRoot(), list_empty() will always return true for
root cgroup
2. For non root, it won't

The check is enhanced to say, don't go by list_empty(), look to see if
this is root.

I think we can change the condition and stop relying on list_empty()
for the check. I agree.


> ==
> 
> I'm sorry if there is a case
>    (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru))
>

Should not be, I think the list_empty() was used to indicated already
unaccounted, so explicit flags should work fine.
 
> 
> >  	/*
> >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	mem = pc->mem_cgroup;
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > +	ClearPageCgroupAcctLru(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_del_init(&pc->lru);
> >  	return;
> >  }
> > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
> >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> >  	 */
> >  	smp_rmb();
> > -	/* unused page is not rotated. */
> > -	if (!PageCgroupUsed(pc))
> > +	/* unused or root page is not rotated. */
> > +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> >  
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > +	SetPageCgroupAcctLru(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> >  
> > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (mem == root_mem_cgroup)
> > +		SetPageCgroupRoot(pc);
> >  
> >  	mem_cgroup_charge_statistics(mem, pc, true);
> >  
> My concern here is there will be a racy moment that pc->flag shows
>   PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup.
> 
> Then, The order of code here should be
> ==
> 	if (mem == root_mem_cgroup)
> 		SetPageCgroupRoot(pc);
> 	pc->mem_cgroup == mem;;
> 	smp_wmb();
> 	switch(type) {
> 	case....
> 	}
> 	// Used bit is set at last.
> ==
> 
> But I wonder it's better to use
> ==
> static inline int page_cgroup_is_under_root(pc)
> {
> 	pc->mem_cgroup == root_mem_cgroup;
> }
> ==
> I'm not sure why PageCgroupRoot() "bit" is necessary.
> Could you clarify the benefit of Root flag ?

The Root flags was used for accounting, but I think we can start
removing it now.

> 
> 
> 
> > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> >  	mem_cgroup_charge_statistics(mem, pc, false);
> >  
> >  	ClearPageCgroupUsed(pc);
> > +	if (mem == root_mem_cgroup)
> > +		ClearPageCgroupRoot(pc);
> >  	/*
> >  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
> >  	 * freed from LRU. This is safe because uncharged page is expected not
> > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> >  	name = MEMFILE_ATTR(cft->private);
> >  	switch (name) {
> >  	case RES_LIMIT:
> > +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >  		/* This function does all necessary parse...reuse it */
> >  		ret = res_counter_memparse_write_strategy(buffer, &val);
> >  		if (ret)
> > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	if (cont->parent == NULL) {
> >  		enable_swap_cgroup();
> >  		parent = NULL;
> > +		root_mem_cgroup = mem;
> >  	} else {
> >  		parent = mem_cgroup_from_cont(cont->parent);
> >  		mem->use_hierarchy = parent->use_hierarchy;
> > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	return &mem->css;
> >  free_out:
> >  	__mem_cgroup_free(mem);
> > +	root_mem_cgroup = NULL;
> >  	return ERR_PTR(error);
> >  }
> >  
> > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > index ecc3918..4406a9c 100644
> > --- a/mm/page_cgroup.c
> > +++ b/mm/page_cgroup.c
> > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
> >  
> >  #endif
> >  
> > -
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  
> >  static DEFINE_MUTEX(swap_cgroup_mutex);
> > 
> Unnecessary diff here.
>

Yes, I'll add back the space.

Thanks for the review 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  9:33             ` Balbir Singh
@ 2009-06-08  0:20               ` Daisuke Nishimura
  0 siblings, 0 replies; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-08  0:20 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro,
	Daisuke Nishimura

On Fri, 5 Jun 2009 17:33:54 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-05 14:51:41]:
> 
> > On Fri, 5 Jun 2009 13:31:07 +0800
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Here is the new version of the patch with the RFC dropped. Andrew,
> > > Kame, could you please take a look. I am just about to fly out to get
> > > back home tomorrow, so there might be some silence, unless I get to
> > > the next WiFi enabled airport.
> > > 
> > > 
> > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > 
> > > Changelog v3 -> v2
> > > 
> > > 1. Rebase to mmotm 2nd June 2009
> > > 2. Test with some of the test cases recommended by Daisuke-San
> > > 
> > > Changelog v2 -> v1
> > > 1. Fix and implement review comments.
> > > 
> > > Feature: Remove the overhead associated with the root cgroup
> > > 
> > > This patch changes the memory cgroup and removes the overhead associated
> > > with accounting all pages in the root cgroup. As a side-effect, we can
> > > no longer set a memory hard limit in the root cgroup.
> > > 
> > > A new flag is used to track page_cgroup associated with the root cgroup
> > > pages. A new flag to track whether the page has been accounted or not
> > > has been added as well. Flags are now set atomically for page_cgroup,
> > > pcg_default_flags is now obsolete, but I've not removed it yet. It
> > > provides some readability to help the code.
> > > 
> > > Tests Results:
> > > 
> > > Obtained by
> > > 
> > > 1. Using tmpfs for mounting filesystem
> > > 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> > > 3. Used -s #cpus*40 -e #cpus*40
> > > 
> > > Reaim
> > > 		withoutpatch	patch
> > > AIM9		9532.48		9807.59
> > > dbase		19344.60	19285.71
> > > new_dbase	20101.65	20163.13
> > > shared		11827.77	11886.65
> > > compute		17317.38	17420.05
> > > 
> > 
> > A few comments.
> > 
> > 
> > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > ---
> > > 
> > >  include/linux/page_cgroup.h |   12 ++++++++++++
> > >  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
> > >  mm/page_cgroup.c            |    1 -
> > >  3 files changed, 50 insertions(+), 5 deletions(-)
> > > 
> > > 
> > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > > index 7339c7b..41cc16c 100644
> > > --- a/include/linux/page_cgroup.h
> > > +++ b/include/linux/page_cgroup.h
> > > @@ -26,6 +26,8 @@ enum {
> > >  	PCG_LOCK,  /* page cgroup is locked */
> > >  	PCG_CACHE, /* charged as cache */
> > >  	PCG_USED, /* this object is in use. */
> > > +	PCG_ROOT, /* page belongs to root cgroup */
> > > +	PCG_ACCT_LRU, /* page has been accounted for */
> > >  };
> > >  
> > >  #define TESTPCGFLAG(uname, lname)			\
> > > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
> > >  
> > >  /* Cache flag is set only once (at allocation) */
> > >  TESTPCGFLAG(Cache, CACHE)
> > > +SETPCGFLAG(Cache, CACHE)
> > >  
> > >  TESTPCGFLAG(Used, USED)
> > >  CLEARPCGFLAG(Used, USED)
> > > +SETPCGFLAG(Used, USED)
> > > +
> > > +SETPCGFLAG(Root, ROOT)
> > > +CLEARPCGFLAG(Root, ROOT)
> > > +TESTPCGFLAG(Root, ROOT)
> > > +
> > > +SETPCGFLAG(AcctLru, ACCT_LRU)
> > > +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> > > +TESTPCGFLAG(AcctLru, ACCT_LRU)
> > >  
> > I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through
> > the kernel.
> 
> OK, I'll make that change. I agree LRU is better.
> 
> > 
> > >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> > >  {
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index a83e039..9561d10 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -43,6 +43,7 @@
> > >  
> > >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> > >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> > >  
> > >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > >  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> > > @@ -197,6 +198,10 @@ enum charge_type {
> > >  #define PCGF_CACHE	(1UL << PCG_CACHE)
> > >  #define PCGF_USED	(1UL << PCG_USED)
> > >  #define PCGF_LOCK	(1UL << PCG_LOCK)
> > > +/* Not used, but added here for completeness */
> > > +#define PCGF_ROOT	(1UL << PCG_ROOT)
> > > +#define PCGF_ACCT	(1UL << PCG_ACCT)
> > > +
> > >  static const unsigned long
> > >  pcg_default_flags[NR_CHARGE_TYPE] = {
> > >  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> > 
> > Could you delete this default_flags ? This is of no use after this patch.
> >
> 
> Yes, I mentioned in the comment that they are for readability of the
> code. I can remove them if required.
>  
> > 
> > > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> > >  		return;
> > >  	pc = lookup_page_cgroup(page);
> > >  	/* can happen while we handle swapcache. */
> > > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > > +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
> > >  		return;
> > I wonder this condition is valid one or not..
> > 
> > IMHO, all check here should be
> > 
> > ==
> > 	if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup)
I think checking !pc->mem_cgroup would also be verbose, it can be
changed to VM_BUG_ON().
And wouldn't "if (!TestClearPageCgroupAcctLRU(pc))" be better ? We can remove
ClearPageCgroupAcctLRU() then.

> > 		return;
> > 	mz = page_cgroup_zoneinfo(pc);
> > 	mem = pc->mem_cgroup;
> > 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > 	ClearPageCgroupAcctLru(pc);
> > 	if (PageCgroupRoot(pc))
> > 		return;
> > 	VM_BUGON(list_empty(&pc->lru);
> > 	list_del_init(&pc->lru);
> > 	return;
> 
> We needed this check because
> 
> 1. After PageCgroupRoot(), list_empty() will always return true for
> root cgroup
> 2. For non root, it won't
> 
> The check is enhanced to say, don't go by list_empty(), look to see if
> this is root.
> 
> I think we can change the condition and stop relying on list_empty()
> for the check. I agree.
> 
> 
> > ==
> > 
> > I'm sorry if there is a case
> >    (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru))
> >
> 
> Should not be, I think the list_empty() was used to indicated already
> unaccounted, so explicit flags should work fine.
>  
> > 
> > >  	/*
> > >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> > > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> > >  	mz = page_cgroup_zoneinfo(pc);
> > >  	mem = pc->mem_cgroup;
Can you delte this obsolete line ?

> > >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > > +	ClearPageCgroupAcctLru(pc);
> > > +	if (PageCgroupRoot(pc))
> > > +		return;
> > >  	list_del_init(&pc->lru);
> > >  	return;
> > >  }
> > > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
> > >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> > >  	 */
> > >  	smp_rmb();
> > > -	/* unused page is not rotated. */
> > > -	if (!PageCgroupUsed(pc))
> > > +	/* unused or root page is not rotated. */
> > > +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
> > >  		return;
> > >  	mz = page_cgroup_zoneinfo(pc);
> > >  	list_move(&pc->lru, &mz->lists[lru]);
> > > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> > >  
> > >  	mz = page_cgroup_zoneinfo(pc);
> > >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > > +	SetPageCgroupAcctLru(pc);
> > > +	if (PageCgroupRoot(pc))
> > > +		return;
> > >  	list_add(&pc->lru, &mz->lists[lru]);
> > >  }
> > >  
Can you add "VM_BUG_ON(PageCgroupAcctLRU(pc))" in mem_cgroup_add_lru_list() ?
And you should change "list_empty(&pc->lru)" in mem_cgroup_lru_add_after_commit_swapcache()
to "!PageCgroupAcctLRU(pc)".


Thanks,
Daisuke Nishimura.

> > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> > >  		css_put(&mem->css);
> > >  		return;
> > >  	}
> > > +
> > >  	pc->mem_cgroup = mem;
> > >  	smp_wmb();
> > > -	pc->flags = pcg_default_flags[ctype];
> > > +	switch (ctype) {
> > > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > > +		SetPageCgroupCache(pc);
> > > +		SetPageCgroupUsed(pc);
> > > +		break;
> > > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > > +		SetPageCgroupUsed(pc);
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > > +
> > > +	if (mem == root_mem_cgroup)
> > > +		SetPageCgroupRoot(pc);
> > >  
> > >  	mem_cgroup_charge_statistics(mem, pc, true);
> > >  
> > My concern here is there will be a racy moment that pc->flag shows
> >   PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup.
> > 
> > Then, The order of code here should be
> > ==
> > 	if (mem == root_mem_cgroup)
> > 		SetPageCgroupRoot(pc);
> > 	pc->mem_cgroup == mem;;
> > 	smp_wmb();
> > 	switch(type) {
> > 	case....
> > 	}
> > 	// Used bit is set at last.
> > ==
> > 
> > But I wonder it's better to use
> > ==
> > static inline int page_cgroup_is_under_root(pc)
> > {
> > 	pc->mem_cgroup == root_mem_cgroup;
> > }
> > ==
> > I'm not sure why PageCgroupRoot() "bit" is necessary.
> > Could you clarify the benefit of Root flag ?
> 
> The Root flags was used for accounting, but I think we can start
> removing it now.
> 
> > 
> > 
> > 
> > > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> > >  	mem_cgroup_charge_statistics(mem, pc, false);
> > >  
> > >  	ClearPageCgroupUsed(pc);
> > > +	if (mem == root_mem_cgroup)
> > > +		ClearPageCgroupRoot(pc);
> > >  	/*
> > >  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
> > >  	 * freed from LRU. This is safe because uncharged page is expected not
> > > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> > >  	name = MEMFILE_ATTR(cft->private);
> > >  	switch (name) {
> > >  	case RES_LIMIT:
> > > +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> > > +			ret = -EINVAL;
> > > +			break;
> > > +		}
> > >  		/* This function does all necessary parse...reuse it */
> > >  		ret = res_counter_memparse_write_strategy(buffer, &val);
> > >  		if (ret)
> > > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > >  	if (cont->parent == NULL) {
> > >  		enable_swap_cgroup();
> > >  		parent = NULL;
> > > +		root_mem_cgroup = mem;
> > >  	} else {
> > >  		parent = mem_cgroup_from_cont(cont->parent);
> > >  		mem->use_hierarchy = parent->use_hierarchy;
> > > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > >  	return &mem->css;
> > >  free_out:
> > >  	__mem_cgroup_free(mem);
> > > +	root_mem_cgroup = NULL;
> > >  	return ERR_PTR(error);
> > >  }
> > >  
> > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > > index ecc3918..4406a9c 100644
> > > --- a/mm/page_cgroup.c
> > > +++ b/mm/page_cgroup.c
> > > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
> > >  
> > >  #endif
> > >  
> > > -
> > >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > >  
> > >  static DEFINE_MUTEX(swap_cgroup_mutex);
> > > 
> > Unnecessary diff here.
> >
> 
> Yes, I'll add back the space.
> 
> Thanks for the review 
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
@ 2009-06-05  6:05           ` Daisuke Nishimura
  2009-06-05  9:47             ` Balbir Singh
  2009-06-05  6:43           ` Daisuke Nishimura
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
  3 siblings, 1 reply; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-05  6:05 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro,
	Daisuke Nishimura

Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU.

> @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
I think we need ClearPageCgroupCache() here.
Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics().
A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page.

> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
I think you should set PCG_ROOT before setting PCG_USED.
IIUC, PCG_ROOT bit must be visible already when PCG_USED is set.

>  	mem_cgroup_charge_statistics(mem, pc, true);
>  


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  6:05           ` Daisuke Nishimura
@ 2009-06-05  9:47             ` Balbir Singh
  2009-06-08  0:03               ` Daisuke Nishimura
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-06-05  9:47 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-05 15:05:27]:

> Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU.
> 
> > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		SetPageCgroupUsed(pc);
> I think we need ClearPageCgroupCache() here.
> Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics().
> A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page.

Yes, I know, I think it is best to set pc->flags to 0 before setting
the bits. Thanks!

> 
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (mem == root_mem_cgroup)
> > +		SetPageCgroupRoot(pc);
> >  
> I think you should set PCG_ROOT before setting PCG_USED.
> IIUC, PCG_ROOT bit must be visible already when PCG_USED is set.

Kame pointed to something similar, I am going to remove PCG_ROOT in
the next version.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  9:47             ` Balbir Singh
@ 2009-06-08  0:03               ` Daisuke Nishimura
  0 siblings, 0 replies; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-08  0:03 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro,
	Daisuke Nishimura

On Fri, 5 Jun 2009 17:47:21 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-05 15:05:27]:
> 
> > Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU.
> > 
> > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> > >  		css_put(&mem->css);
> > >  		return;
> > >  	}
> > > +
> > >  	pc->mem_cgroup = mem;
> > >  	smp_wmb();
> > > -	pc->flags = pcg_default_flags[ctype];
> > > +	switch (ctype) {
> > > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > > +		SetPageCgroupCache(pc);
> > > +		SetPageCgroupUsed(pc);
> > > +		break;
> > > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > > +		SetPageCgroupUsed(pc);
> > I think we need ClearPageCgroupCache() here.
> > Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics().
> > A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page.
> 
> Yes, I know, I think it is best to set pc->flags to 0 before setting
> the bits. Thanks!
> 
I don't think clearing pc->flags is a good idea.
It can break PCG_ACCT_LRU bit.
ClearPageCgroupCache() before SetPageCgroupUsed() in case of CHARGE_TYPE_MAPPED
would be enough.

Thanks,
Daisuke Nishimura.

> > 
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > > +
> > > +	if (mem == root_mem_cgroup)
> > > +		SetPageCgroupRoot(pc);
> > >  
> > I think you should set PCG_ROOT before setting PCG_USED.
> > IIUC, PCG_ROOT bit must be visible already when PCG_USED is set.
> 
> Kame pointed to something similar, I am going to remove PCG_ROOT in
> the next version.
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
  2009-06-05  6:05           ` Daisuke Nishimura
@ 2009-06-05  6:43           ` Daisuke Nishimura
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
  3 siblings, 0 replies; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-05  6:43 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro,
	Daisuke Nishimura

On Fri, 5 Jun 2009 13:31:07 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Here is the new version of the patch with the RFC dropped. Andrew,
> Kame, could you please take a look. I am just about to fly out to get
> back home tomorrow, so there might be some silence, unless I get to
> the next WiFi enabled airport.
> 
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Fix and implement review comments.
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> pcg_default_flags is now obsolete, but I've not removed it yet. It
> provides some readability to help the code.
> 
> Tests Results:
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |   12 ++++++++++++
>  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..41cc16c 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(AcctLru, ACCT_LRU)
> +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> +TESTPCGFLAG(AcctLru, ACCT_LRU)
>  
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a83e039..9561d10 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -197,6 +198,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
> @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> +	if (mem == root_mem_cgroup)
> +		ClearPageCgroupRoot(pc);

If we clear PCG_ROOT here, I think we cannot trust PageCgroupRoot() in mem_cgroup_del_lru_list().
And, if we never clear it on free path, we should clear it on commit_charge if mem != root_mem_cgroup.

Thanks,
Daisuke Nishimura.

>  	/*
>  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
> @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index ecc3918..4406a9c 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
>  
>  #endif
>  
> -
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  
>  static DEFINE_MUTEX(swap_cgroup_mutex);
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Low overhead patches for the memory cgroup controller (v4)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
                             ` (2 preceding siblings ...)
  2009-06-05  6:43           ` Daisuke Nishimura
@ 2009-06-14 18:37           ` Balbir Singh
  2009-06-15  2:04             ` KAMEZAWA Hiroyuki
  2009-06-15  2:18             ` Daisuke Nishimura
  3 siblings, 2 replies; 30+ messages in thread
From: Balbir Singh @ 2009-06-14 18:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Andrew Morton
  Cc: linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

Here is v4 of the patches, please review and comment

Feature: Remove the overhead associated with the root cgroup

From: Balbir Singh <balbir@linux.vnet.ibm.com>

changelog v4 -> v3
1. Rebase to mmotm 9th june 2009
2. Remove PageCgroupRoot, we have account LRU flags to indicate that
   we do only accounting and no reclaim.
3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
   we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
4. More LRU functions are aware of PageCgroupAcctLRU

Changelog v3 -> v2

1. Rebase to mmotm 2nd June 2009
2. Test with some of the test cases recommended by Daisuke-San

Changelog v2 -> v1
1. Rebase to latest mmotm

This patch changes the memory cgroup and removes the overhead associated
with accounting all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag to track whether the page has been accounted or not
has been added as well. Flags are now set atomically for page_cgroup,

Tests:

Results (for v2)

Obtained by

1. Using tmpfs for mounting filesystem
2. Changing sync to be /bin/true (so that sync is not the bottleneck)
3. Used -s #cpus*40 -e #cpus*40

Reaim
		withoutpatch	patch
AIM9		9532.48		9807.59
dbase		19344.60	19285.71
new_dbase	20101.65	20163.13
shared		11827.77	11886.65
compute		17317.38	17420.05

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |    5 ++++
 mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
 2 files changed, 54 insertions(+), 10 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..57c4d50 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,7 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ACCT_LRU, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 
+SETPCGFLAG(AcctLRU, ACCT_LRU)
+CLEARPCGFLAG(AcctLRU, ACCT_LRU)
+TESTPCGFLAG(AcctLRU, ACCT_LRU)
+
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
 	return page_to_nid(pc->page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6ceb6f2..399d416 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
@@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
 
+static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
+{
+	return (mem == root_mem_cgroup);
+}
+
 static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
 					 struct page_cgroup *pc,
 					 bool charge)
@@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	mem = pc->mem_cgroup;
+	if (!mem)
+		return;
+	if (mem_cgroup_is_root(mem)) {
+		if (!PageCgroupAcctLRU(pc))
+			return;
+	} else if (list_empty(&pc->lru))
 		return;
+
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
 	 * removed from global LRU.
 	 */
 	mz = page_cgroup_zoneinfo(pc);
-	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	if (PageCgroupAcctLRU(pc)) {
+		ClearPageCgroupAcctLRU(pc);
+		return;
+	}
 	list_del_init(&pc->lru);
 	return;
 }
@@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	if (mem_cgroup_is_root(pc->mem_cgroup)) {
+		SetPageCgroupAcctLRU(pc);
+		return;
+	}
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
  * it again. This function is only used to charge SwapCache. It's done under
  * lock_page and expected that zone->lru_lock is never held.
  */
-static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
+							struct page_cgroup *pc)
 {
 	unsigned long flags;
 	struct zone *zone = page_zone(page);
-	struct page_cgroup *pc = lookup_page_cgroup(page);
 
+	if (!pc->mem_cgroup ||
+		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
+		return;
 	spin_lock_irqsave(&zone->lru_lock, flags);
 	/*
 	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
@@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
 	spin_unlock_irqrestore(&zone->lru_lock, flags);
 }
 
-static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
+							struct page_cgroup *pc)
 {
 	unsigned long flags;
 	struct zone *zone = page_zone(page);
-	struct page_cgroup *pc = lookup_page_cgroup(page);
 
+	if (!pc->mem_cgroup ||
+		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
+		return;
 	spin_lock_irqsave(&zone->lru_lock, flags);
 	/* link when the page is linked to LRU but page_cgroup isn't */
 	if (PageLRU(page) && list_empty(&pc->lru))
@@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
 void mem_cgroup_move_lists(struct page *page,
 			   enum lru_list from, enum lru_list to)
 {
+	struct page_cgroup *pc = lookup_page_cgroup(page);
 	if (mem_cgroup_disabled())
 		return;
+	smp_rmb();
+	if (!pc->mem_cgroup ||
+		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
+		return;
 	mem_cgroup_del_lru_list(page, from);
 	mem_cgroup_add_lru_list(page, to);
 }
@@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
 	pc->flags = pcg_default_flags[ctype];
@@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
 	if (!ptr)
 		return;
 	pc = lookup_page_cgroup(page);
-	mem_cgroup_lru_del_before_commit_swapcache(page);
+	smp_rmb();
+	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
 	__mem_cgroup_commit_charge(ptr, pc, ctype);
-	mem_cgroup_lru_add_after_commit_swapcache(page);
+	mem_cgroup_lru_add_after_commit_swapcache(page, pc);
 	/*
 	 * Now swap is on-memory. This means this page may be
 	 * counted both as mem and swap....double count.
@@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
@ 2009-06-15  2:04             ` KAMEZAWA Hiroyuki
  2009-06-15  2:18             ` Daisuke Nishimura
  1 sibling, 0 replies; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-15  2:04 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

On Mon, 15 Jun 2009 00:07:40 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Here is v4 of the patches, please review and comment
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> changelog v4 -> v3
> 1. Rebase to mmotm 9th june 2009
> 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
>    we do only accounting and no reclaim.
> 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
>    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
> 4. More LRU functions are aware of PageCgroupAcctLRU
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Rebase to latest mmotm
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> 
> Tests:
> 
> Results (for v2)
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 

Hmm, how much overhead this patch adds for non-root cgroup ?
It seems getting better in general. But I have a few suggestions.


> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |    5 ++++
>  mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 54 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..57c4d50 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,7 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>  
> +SETPCGFLAG(AcctLRU, ACCT_LRU)
> +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> +
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
>  	return page_to_nid(pc->page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6ceb6f2..399d416 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
>  
> +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> +{
> +	return (mem == root_mem_cgroup);
> +}
> +
>  static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
>  					 struct page_cgroup *pc,
>  					 bool charge)
> @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	mem = pc->mem_cgroup;
> +	if (!mem)
> +		return;
> +	if (mem_cgroup_is_root(mem)) {
> +		if (!PageCgroupAcctLRU(pc))
> +			return;
> +	} else if (list_empty(&pc->lru))
>  		return;
> +
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
>  	 * removed from global LRU.
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
> -	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	if (PageCgroupAcctLRU(pc)) {
> +		ClearPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_del_init(&pc->lru);
>  	return;
>  }
Looking through the whole code, PageCgroupAcctLRU() is meaningful only when
pc->mem_cgroup == root_mem_cgroup.  Right ?

I wonder making PageCgroupAcctLRU() be always meaningful and remove all
!list_empty(&pc->lru) check is a way to go.

If do so, this function can be written as

==
	if (!PageCgroupAcctLRU(pc))
		return;
	mem = pc->mem_cgroup;
	mz = page_cgroup_zoneinfo(pc);
	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
	ClearPageCgroupAcctLRU(pc);
	/* We don't maintain LRU for root cgroup. Global LRU works for us. */
	if (!mem_cgroup_is_root(mem))
		list_del_init(&pc->lru);
==
This seems much straightforward. 

> @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	if (mem_cgroup_is_root(pc->mem_cgroup)) {
> +		SetPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
With above (my) rule.   Here will be
	SetPageCgroupAcctLRU(pc);
	if (!mem_cgroup_is_root(pc->mem_cgroup))
		list_add(&pc->lru, &mz->lists[lru]);

> @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>   * it again. This function is only used to charge SwapCache. It's done under
>   * lock_page and expected that zone->lru_lock is never held.
>   */
> -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
PageCgroupAcctLRU() check is done without zone->lock and this is racy if you check
flag. Considering how "pagevec" works, this race tend to be big.


>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/*
>  	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
> @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
>  	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  }
>  
> -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;

The same comment as above.


>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/* link when the page is linked to LRU but page_cgroup isn't */
>  	if (PageLRU(page) && list_empty(&pc->lru))
> @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
>  void mem_cgroup_move_lists(struct page *page,
>  			   enum lru_list from, enum lru_list to)
>  {
> +	struct page_cgroup *pc = lookup_page_cgroup(page);
>  	if (mem_cgroup_disabled())
>  		return;
> +	smp_rmb();
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	mem_cgroup_del_lru_list(page, from);
>  	mem_cgroup_add_lru_list(page, to);
>  }
Here, too.


> @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
>  	pc->flags = pcg_default_flags[ctype];
> @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
>  	if (!ptr)
>  		return;
>  	pc = lookup_page_cgroup(page);
> -	mem_cgroup_lru_del_before_commit_swapcache(page);
> +	smp_rmb();
> +	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
>  	__mem_cgroup_commit_charge(ptr, pc, ctype);
> -	mem_cgroup_lru_add_after_commit_swapcache(page);
> +	mem_cgroup_lru_add_after_commit_swapcache(page, pc);

Why this change ? When you adds memory barrier, plz add comments.


>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}

Could you add modification to Documentation in the next post ?


>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  

Could you start next thread in the next post ? Once I read and make this from
unread to read, this goes far deep of old mail tree ;)


Regards,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
  2009-06-15  2:04             ` KAMEZAWA Hiroyuki
@ 2009-06-15  2:18             ` Daisuke Nishimura
  2009-06-15  2:23               ` KAMEZAWA Hiroyuki
  2009-06-15  3:00               ` Balbir Singh
  1 sibling, 2 replies; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-15  2:18 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro,
	Daisuke Nishimura

On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Here is v4 of the patches, please review and comment
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> changelog v4 -> v3
> 1. Rebase to mmotm 9th june 2009
> 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
>    we do only accounting and no reclaim.
hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be
used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks.

> 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
>    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU
outside of zone->lru_lock.

IMHO, the most complicated case is a SwapCache which has been read ahead by
a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be
on page_vec and be drained to LRU asymmetrically with do_swap_page().
Well, yes it would be safe just because PCGF_ACCT_LRU would not be set
if PCGF_USED has not been set, but I don't think it's a good idea to touch
PCGF_ACCT_LRU outside of zone->lru_lock anyway.


Doesn't a patch like below work for you ?
Lightly tested under global memory pressure(w/o memcg's memory pressure)
on a small machine(just a bit modified from then though).

===
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 include/linux/page_cgroup.h |   13 ++++++++++
 mm/memcontrol.c             |   54 +++++++++++++++++++++++++++++++-----------
 2 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..debd8ba 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,7 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ACCT_LRU, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -40,11 +41,23 @@ static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
 static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 	{ clear_bit(PCG_##lname, &pc->flags);  }
 
+#define TESTCLEARPCGFLAG(uname, lname)			\
+static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
+	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
+
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
+CLEARPCGFLAG(Cache, CACHE)
+SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
+SETPCGFLAG(Used, USED)
+
+SETPCGFLAG(AcctLRU, ACCT_LRU)
+CLEARPCGFLAG(AcctLRU, ACCT_LRU)
+TESTPCGFLAG(AcctLRU, ACCT_LRU)
+TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
 
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dbece65..820f3e6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
@@ -200,13 +201,8 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
-static const unsigned long
-pcg_default_flags[NR_CHARGE_TYPE] = {
-	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
-	PCGF_USED | PCGF_LOCK, /* Anon */
-	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* Shmem */
-	0, /* FORCE */
-};
+/* Not used, but added here for completeness */
+#define PCGF_ACCT	(1UL << PCG_ACCT)
 
 /* for encoding cft->private value on file */
 #define _MEM			(0)
@@ -354,6 +350,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
 	return ret;
 }
 
+static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
+{
+	return (mem == root_mem_cgroup);
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -371,22 +372,24 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
 void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 {
 	struct page_cgroup *pc;
-	struct mem_cgroup *mem;
 	struct mem_cgroup_per_zone *mz;
 
 	if (mem_cgroup_disabled())
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if (!TestClearPageCgroupAcctLRU(pc))
 		return;
+	VM_BUG_ON(!pc->mem_cgroup);
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
 	 * removed from global LRU.
 	 */
 	mz = page_cgroup_zoneinfo(pc);
-	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	if (mem_cgroup_is_root(pc->mem_cgroup))
+		return;
+	VM_BUG_ON(list_empty(&pc->lru));
 	list_del_init(&pc->lru);
 	return;
 }
@@ -410,8 +413,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -425,6 +428,7 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 	if (mem_cgroup_disabled())
 		return;
 	pc = lookup_page_cgroup(page);
+	VM_BUG_ON(PageCgroupAcctLRU(pc));
 	/*
 	 * Used bit is set without atomic ops but after smp_wmb().
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
@@ -435,6 +439,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcctLRU(pc);
+	if (mem_cgroup_is_root(pc->mem_cgroup))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -469,7 +476,7 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
 
 	spin_lock_irqsave(&zone->lru_lock, flags);
 	/* link when the page is linked to LRU but page_cgroup isn't */
-	if (PageLRU(page) && list_empty(&pc->lru))
+	if (PageLRU(page) && !PageCgroupAcctLRU(pc))
 		mem_cgroup_add_lru_list(page, page_lru(page));
 	spin_unlock_irqrestore(&zone->lru_lock, flags);
 }
@@ -1106,9 +1113,22 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
-	pc->flags = pcg_default_flags[ctype];
+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		ClearPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}
 
 	mem_cgroup_charge_statistics(mem, pc, true);
 
@@ -2047,6 +2067,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2513,6 +2537,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2541,6 +2566,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
 
===


Thanks,
Daisuke Nishimura.

> 4. More LRU functions are aware of PageCgroupAcctLRU
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Rebase to latest mmotm
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> 
> Tests:
> 
> Results (for v2)
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |    5 ++++
>  mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 54 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..57c4d50 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,7 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>  
> +SETPCGFLAG(AcctLRU, ACCT_LRU)
> +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> +
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
>  	return page_to_nid(pc->page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6ceb6f2..399d416 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
>  
> +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> +{
> +	return (mem == root_mem_cgroup);
> +}
> +
>  static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
>  					 struct page_cgroup *pc,
>  					 bool charge)
> @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	mem = pc->mem_cgroup;
> +	if (!mem)
> +		return;
> +	if (mem_cgroup_is_root(mem)) {
> +		if (!PageCgroupAcctLRU(pc))
> +			return;
> +	} else if (list_empty(&pc->lru))
>  		return;
> +
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
>  	 * removed from global LRU.
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
> -	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	if (PageCgroupAcctLRU(pc)) {
> +		ClearPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	if (mem_cgroup_is_root(pc->mem_cgroup)) {
> +		SetPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>   * it again. This function is only used to charge SwapCache. It's done under
>   * lock_page and expected that zone->lru_lock is never held.
>   */
> -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/*
>  	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
> @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
>  	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  }
>  
> -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/* link when the page is linked to LRU but page_cgroup isn't */
>  	if (PageLRU(page) && list_empty(&pc->lru))
> @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
>  void mem_cgroup_move_lists(struct page *page,
>  			   enum lru_list from, enum lru_list to)
>  {
> +	struct page_cgroup *pc = lookup_page_cgroup(page);
>  	if (mem_cgroup_disabled())
>  		return;
> +	smp_rmb();
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	mem_cgroup_del_lru_list(page, from);
>  	mem_cgroup_add_lru_list(page, to);
>  }
> @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
>  	pc->flags = pcg_default_flags[ctype];
> @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
>  	if (!ptr)
>  		return;
>  	pc = lookup_page_cgroup(page);
> -	mem_cgroup_lru_del_before_commit_swapcache(page);
> +	smp_rmb();
> +	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
>  	__mem_cgroup_commit_charge(ptr, pc, ctype);
> -	mem_cgroup_lru_add_after_commit_swapcache(page);
> +	mem_cgroup_lru_add_after_commit_swapcache(page, pc);
>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  2:18             ` Daisuke Nishimura
@ 2009-06-15  2:23               ` KAMEZAWA Hiroyuki
  2009-06-15  2:44                 ` Balbir Singh
  2009-06-15  3:00               ` Balbir Singh
  1 sibling, 1 reply; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-15  2:23 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: balbir, Andrew Morton, linux-mm@kvack.org, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

On Mon, 15 Jun 2009 11:18:17 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > Here is v4 of the patches, please review and comment
> > 
> > Feature: Remove the overhead associated with the root cgroup
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > changelog v4 -> v3
> > 1. Rebase to mmotm 9th june 2009
> > 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
> >    we do only accounting and no reclaim.
> hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be
> used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks.
> 
> > 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
> >    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
> It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU
> outside of zone->lru_lock.
> 
> IMHO, the most complicated case is a SwapCache which has been read ahead by
> a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be
> on page_vec and be drained to LRU asymmetrically with do_swap_page().
> Well, yes it would be safe just because PCGF_ACCT_LRU would not be set
> if PCGF_USED has not been set, but I don't think it's a good idea to touch
> PCGF_ACCT_LRU outside of zone->lru_lock anyway.
> 
> 
> Doesn't a patch like below work for you ?
> Lightly tested under global memory pressure(w/o memcg's memory pressure)
> on a small machine(just a bit modified from then though).
> 
This patch includes almost all what I want ;)

Thanks,
-Kame


> ===
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
>  include/linux/page_cgroup.h |   13 ++++++++++
>  mm/memcontrol.c             |   54 +++++++++++++++++++++++++++++++-----------
>  2 files changed, 53 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..debd8ba 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,7 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -40,11 +41,23 @@ static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
>  static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  	{ clear_bit(PCG_##lname, &pc->flags);  }
>  
> +#define TESTCLEARPCGFLAG(uname, lname)			\
> +static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
> +	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
> +
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +CLEARPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(AcctLRU, ACCT_LRU)
> +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
>  
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index dbece65..820f3e6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -200,13 +201,8 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> -static const unsigned long
> -pcg_default_flags[NR_CHARGE_TYPE] = {
> -	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> -	PCGF_USED | PCGF_LOCK, /* Anon */
> -	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* Shmem */
> -	0, /* FORCE */
> -};
> +/* Not used, but added here for completeness */
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
>  
>  /* for encoding cft->private value on file */
>  #define _MEM			(0)
> @@ -354,6 +350,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
>  	return ret;
>  }
>  
> +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> +{
> +	return (mem == root_mem_cgroup);
> +}
> +
>  /*
>   * Following LRU functions are allowed to be used without PCG_LOCK.
>   * Operations are called by routine of global LRU independently from memcg.
> @@ -371,22 +372,24 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
>  void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  {
>  	struct page_cgroup *pc;
> -	struct mem_cgroup *mem;
>  	struct mem_cgroup_per_zone *mz;
>  
>  	if (mem_cgroup_disabled())
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if (!TestClearPageCgroupAcctLRU(pc))
>  		return;
> +	VM_BUG_ON(!pc->mem_cgroup);
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
>  	 * removed from global LRU.
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
> -	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	if (mem_cgroup_is_root(pc->mem_cgroup))
> +		return;
> +	VM_BUG_ON(list_empty(&pc->lru));
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -410,8 +413,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -425,6 +428,7 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  	if (mem_cgroup_disabled())
>  		return;
>  	pc = lookup_page_cgroup(page);
> +	VM_BUG_ON(PageCgroupAcctLRU(pc));
>  	/*
>  	 * Used bit is set without atomic ops but after smp_wmb().
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> @@ -435,6 +439,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcctLRU(pc);
> +	if (mem_cgroup_is_root(pc->mem_cgroup))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -469,7 +476,7 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
>  
>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/* link when the page is linked to LRU but page_cgroup isn't */
> -	if (PageLRU(page) && list_empty(&pc->lru))
> +	if (PageLRU(page) && !PageCgroupAcctLRU(pc))
>  		mem_cgroup_add_lru_list(page, page_lru(page));
>  	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  }
> @@ -1106,9 +1113,22 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		ClearPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
> @@ -2047,6 +2067,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2513,6 +2537,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2541,6 +2566,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
>  
> ===
> 
> 
> Thanks,
> Daisuke Nishimura.
> 
> > 4. More LRU functions are aware of PageCgroupAcctLRU
> > 
> > Changelog v3 -> v2
> > 
> > 1. Rebase to mmotm 2nd June 2009
> > 2. Test with some of the test cases recommended by Daisuke-San
> > 
> > Changelog v2 -> v1
> > 1. Rebase to latest mmotm
> > 
> > This patch changes the memory cgroup and removes the overhead associated
> > with accounting all pages in the root cgroup. As a side-effect, we can
> > no longer set a memory hard limit in the root cgroup.
> > 
> > A new flag to track whether the page has been accounted or not
> > has been added as well. Flags are now set atomically for page_cgroup,
> > 
> > Tests:
> > 
> > Results (for v2)
> > 
> > Obtained by
> > 
> > 1. Using tmpfs for mounting filesystem
> > 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> > 3. Used -s #cpus*40 -e #cpus*40
> > 
> > Reaim
> > 		withoutpatch	patch
> > AIM9		9532.48		9807.59
> > dbase		19344.60	19285.71
> > new_dbase	20101.65	20163.13
> > shared		11827.77	11886.65
> > compute		17317.38	17420.05
> > 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> > 
> >  include/linux/page_cgroup.h |    5 ++++
> >  mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
> >  2 files changed, 54 insertions(+), 10 deletions(-)
> > 
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..57c4d50 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,7 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ACCT_LRU, /* page has been accounted for */
> >  };
> >  
> >  #define TESTPCGFLAG(uname, lname)			\
> > @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
> >  TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> >  
> > +SETPCGFLAG(AcctLRU, ACCT_LRU)
> > +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> > +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> > +
> >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> >  {
> >  	return page_to_nid(pc->page);
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 6ceb6f2..399d416 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -43,6 +43,7 @@
> >  
> >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> >  
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> > @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
> >  static void mem_cgroup_put(struct mem_cgroup *mem);
> >  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> >  
> > +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> > +{
> > +	return (mem == root_mem_cgroup);
> > +}
> > +
> >  static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> >  					 struct page_cgroup *pc,
> >  					 bool charge)
> > @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> >  	/* can happen while we handle swapcache. */
> > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > +	mem = pc->mem_cgroup;
> > +	if (!mem)
> > +		return;
> > +	if (mem_cgroup_is_root(mem)) {
> > +		if (!PageCgroupAcctLRU(pc))
> > +			return;
> > +	} else if (list_empty(&pc->lru))
> >  		return;
> > +
> >  	/*
> >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> >  	 * removed from global LRU.
> >  	 */
> >  	mz = page_cgroup_zoneinfo(pc);
> > -	mem = pc->mem_cgroup;
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > +	if (PageCgroupAcctLRU(pc)) {
> > +		ClearPageCgroupAcctLRU(pc);
> > +		return;
> > +	}
> >  	list_del_init(&pc->lru);
> >  	return;
> >  }
> > @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
> >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> >  	 */
> >  	smp_rmb();
> > -	/* unused page is not rotated. */
> > -	if (!PageCgroupUsed(pc))
> > +	/* unused or root page is not rotated. */
> > +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> >  
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > +	if (mem_cgroup_is_root(pc->mem_cgroup)) {
> > +		SetPageCgroupAcctLRU(pc);
> > +		return;
> > +	}
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> >  
> > @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> >   * it again. This function is only used to charge SwapCache. It's done under
> >   * lock_page and expected that zone->lru_lock is never held.
> >   */
> > -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> > +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
> > +							struct page_cgroup *pc)
> >  {
> >  	unsigned long flags;
> >  	struct zone *zone = page_zone(page);
> > -	struct page_cgroup *pc = lookup_page_cgroup(page);
> >  
> > +	if (!pc->mem_cgroup ||
> > +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> > +		return;
> >  	spin_lock_irqsave(&zone->lru_lock, flags);
> >  	/*
> >  	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
> > @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> >  	spin_unlock_irqrestore(&zone->lru_lock, flags);
> >  }
> >  
> > -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> > +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
> > +							struct page_cgroup *pc)
> >  {
> >  	unsigned long flags;
> >  	struct zone *zone = page_zone(page);
> > -	struct page_cgroup *pc = lookup_page_cgroup(page);
> >  
> > +	if (!pc->mem_cgroup ||
> > +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> > +		return;
> >  	spin_lock_irqsave(&zone->lru_lock, flags);
> >  	/* link when the page is linked to LRU but page_cgroup isn't */
> >  	if (PageLRU(page) && list_empty(&pc->lru))
> > @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> >  void mem_cgroup_move_lists(struct page *page,
> >  			   enum lru_list from, enum lru_list to)
> >  {
> > +	struct page_cgroup *pc = lookup_page_cgroup(page);
> >  	if (mem_cgroup_disabled())
> >  		return;
> > +	smp_rmb();
> > +	if (!pc->mem_cgroup ||
> > +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> > +		return;
> >  	mem_cgroup_del_lru_list(page, from);
> >  	mem_cgroup_add_lru_list(page, to);
> >  }
> > @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> >  	pc->flags = pcg_default_flags[ctype];
> > @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> >  	if (!ptr)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> > -	mem_cgroup_lru_del_before_commit_swapcache(page);
> > +	smp_rmb();
> > +	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
> >  	__mem_cgroup_commit_charge(ptr, pc, ctype);
> > -	mem_cgroup_lru_add_after_commit_swapcache(page);
> > +	mem_cgroup_lru_add_after_commit_swapcache(page, pc);
> >  	/*
> >  	 * Now swap is on-memory. This means this page may be
> >  	 * counted both as mem and swap....double count.
> > @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> >  	name = MEMFILE_ATTR(cft->private);
> >  	switch (name) {
> >  	case RES_LIMIT:
> > +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >  		/* This function does all necessary parse...reuse it */
> >  		ret = res_counter_memparse_write_strategy(buffer, &val);
> >  		if (ret)
> > @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	if (cont->parent == NULL) {
> >  		enable_swap_cgroup();
> >  		parent = NULL;
> > +		root_mem_cgroup = mem;
> >  	} else {
> >  		parent = mem_cgroup_from_cont(cont->parent);
> >  		mem->use_hierarchy = parent->use_hierarchy;
> > @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	return &mem->css;
> >  free_out:
> >  	__mem_cgroup_free(mem);
> > +	root_mem_cgroup = NULL;
> >  	return ERR_PTR(error);
> >  }
> >  
> > 
> > -- 
> > 	Balbir
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  2:23               ` KAMEZAWA Hiroyuki
@ 2009-06-15  2:44                 ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2009-06-15  2:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

KAMEZAWA Hiroyuki wrote:
> On Mon, 15 Jun 2009 11:18:17 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
>> On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>> Here is v4 of the patches, please review and comment
>>>
>>> Feature: Remove the overhead associated with the root cgroup
>>>
>>> From: Balbir Singh <balbir@linux.vnet.ibm.com>
>>>
>>> changelog v4 -> v3
>>> 1. Rebase to mmotm 9th june 2009
>>> 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
>>>    we do only accounting and no reclaim.
>> hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be
>> used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks.
>>
>>> 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
>>>    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
>> It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU
>> outside of zone->lru_lock.
>>
>> IMHO, the most complicated case is a SwapCache which has been read ahead by
>> a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be
>> on page_vec and be drained to LRU asymmetrically with do_swap_page().
>> Well, yes it would be safe just because PCGF_ACCT_LRU would not be set
>> if PCGF_USED has not been set, but I don't think it's a good idea to touch
>> PCGF_ACCT_LRU outside of zone->lru_lock anyway.
>>
>>
>> Doesn't a patch like below work for you ?
>> Lightly tested under global memory pressure(w/o memcg's memory pressure)
>> on a small machine(just a bit modified from then though).
>>

OK, so you like the older meaning and implementation, the code seems fine to me,
I like the removal of list_empty() checks that you and Kame have proposed.


-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  2:18             ` Daisuke Nishimura
  2009-06-15  2:23               ` KAMEZAWA Hiroyuki
@ 2009-06-15  3:00               ` Balbir Singh
  2009-06-15  3:09                 ` Daisuke Nishimura
  1 sibling, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-06-15  3:00 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

Daisuke Nishimura wrote:

>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];

pc->flags needs to be reset here, otherwise we have the danger the carrying over
older bits. I'll merge your changes and test.


-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:00               ` Balbir Singh
@ 2009-06-15  3:09                 ` Daisuke Nishimura
  2009-06-15  3:22                   ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-15  3:09 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro,
	Daisuke Nishimura

On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Daisuke Nishimura wrote:
> 
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> 
> pc->flags needs to be reset here, otherwise we have the danger the carrying over
> older bits. I'll merge your changes and test.
> 
hmm, why ?

I do in my patch:

+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		ClearPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}

So, all the necessary flags are set and all the unnecessary ones are cleared, right ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:09                 ` Daisuke Nishimura
@ 2009-06-15  3:22                   ` Balbir Singh
  2009-06-15  3:46                     ` Daisuke Nishimura
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-06-15  3:22 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

Daisuke Nishimura wrote:
> On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> Daisuke Nishimura wrote:
>>
>>>  	pc->mem_cgroup = mem;
>>>  	smp_wmb();
>>> -	pc->flags = pcg_default_flags[ctype];
>> pc->flags needs to be reset here, otherwise we have the danger the carrying over
>> older bits. I'll merge your changes and test.
>>
> hmm, why ?
> 
> I do in my patch:
> 
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		ClearPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> 

Yes, I did that in the older code, what I was suggesting was just an additional
step to ensure that in the future if we add new flags, we don't end up with a
long list of initializations and clearing or if we forget to clear pc->flags and
reuse the page_cgroup, it might be a problem. My message was confusing, it
should have been resetting the pc->flags will provide protection for any future
addition of flags.

I am testing your patch which is the modified version of v3 with your changes
and have your signed-off-by in it as well as I post v5. Is that OK?

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:22                   ` Balbir Singh
@ 2009-06-15  3:46                     ` Daisuke Nishimura
  2009-06-15  4:22                       ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-15  3:46 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro,
	Daisuke Nishimura

On Mon, 15 Jun 2009 08:52:56 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Daisuke Nishimura wrote:
> > On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >> Daisuke Nishimura wrote:
> >>
> >>>  	pc->mem_cgroup = mem;
> >>>  	smp_wmb();
> >>> -	pc->flags = pcg_default_flags[ctype];
> >> pc->flags needs to be reset here, otherwise we have the danger the carrying over
> >> older bits. I'll merge your changes and test.
> >>
> > hmm, why ?
> > 
> > I do in my patch:
> > 
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		ClearPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > 
> 
> Yes, I did that in the older code, what I was suggesting was just an additional
> step to ensure that in the future if we add new flags, we don't end up with a
> long list of initializations and clearing or if we forget to clear pc->flags and
> reuse the page_cgroup, it might be a problem. My message was confusing, it
> should have been resetting the pc->flags will provide protection for any future
> addition of flags.
> 
O.K. I see your point.

But we shouldn't touch PCG_ACCT_LRU flag here. IIUC, that's why we abandon
pcg_default_flags[]. Please take care of it.

> I am testing your patch which is the modified version of v3 with your changes
> and have your signed-off-by in it as well as I post v5. Is that OK?
> 
Sure :)


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:46                     ` Daisuke Nishimura
@ 2009-06-15  4:22                       ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2009-06-15  4:22 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm@kvack.org,
	lizf@cn.fujitsu.com, menage@google.com, KOSAKI Motohiro

Daisuke Nishimura wrote:
> On Mon, 15 Jun 2009 08:52:56 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> Daisuke Nishimura wrote:
>>> On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>>> Daisuke Nishimura wrote:
>>>>
>>>>>  	pc->mem_cgroup = mem;
>>>>>  	smp_wmb();
>>>>> -	pc->flags = pcg_default_flags[ctype];
>>>> pc->flags needs to be reset here, otherwise we have the danger the carrying over
>>>> older bits. I'll merge your changes and test.
>>>>
>>> hmm, why ?
>>>
>>> I do in my patch:
>>>
>>> +	switch (ctype) {
>>> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
>>> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
>>> +		SetPageCgroupCache(pc);
>>> +		SetPageCgroupUsed(pc);
>>> +		break;
>>> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
>>> +		ClearPageCgroupCache(pc);
>>> +		SetPageCgroupUsed(pc);
>>> +		break;
>>> +	default:
>>> +		break;
>>> +	}
>>>
>> Yes, I did that in the older code, what I was suggesting was just an additional
>> step to ensure that in the future if we add new flags, we don't end up with a
>> long list of initializations and clearing or if we forget to clear pc->flags and
>> reuse the page_cgroup, it might be a problem. My message was confusing, it
>> should have been resetting the pc->flags will provide protection for any future
>> addition of flags.
>>
> O.K. I see your point.
> 
> But we shouldn't touch PCG_ACCT_LRU flag here. IIUC, that's why we abandon
> pcg_default_flags[]. Please take care of it.
> 

I am keeping the pc->flags removed as in the earlier patch, but something to
keep in mind as we review further changes to the flags field.

>> I am testing your patch which is the modified version of v3 with your changes
>> and have your signed-off-by in it as well as I post v5. Is that OK?
>>
> Sure :)
> 

Just sending it out, now, Thanks!

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-15 17:45 [RFC] Low overhead patches for the memory cgroup controller (v2) KAMEZAWA Hiroyuki
  2009-05-15 18:16 ` Balbir Singh
@ 2009-05-17  4:15 ` Balbir Singh
  2009-06-01  4:25   ` Daisuke Nishimura
  1 sibling, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2009-05-17  4:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton,
	nishimura@mxp.nes.nec.co.jp, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:

> I think set/clear flag here adds race condtion....because pc->flags is
> modfied by
>   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> you have to modify above lines to be
> 
>   SetPageCgroupCache(pc) or some..
>   ...
>   SetPageCgroupUsed(pc)
> 
> Then, you can use set_bit() without lock_page_cgroup().
> (Currently, pc->flags is modified only under lock_page_cgroup(), so,
>  non atomic code is used.)
>

Here is the next version of the patch


Feature: Remove the overhead associated with the root cgroup

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch changes the memory cgroup and removes the overhead associated
with accounting all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag is used to track page_cgroup associated with the root cgroup
pages. A new flag to track whether the page has been accounted or not
has been added as well. Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete, but I've not removed it yet. It
provides some readability to help the code.

Tests:
1. Tested lightly, previous versions showed good performance improvement 10%.

NOTE:
I haven't got the time right now to run oprofile and get detailed test results,
since I am in the middle of travel.

Please review the code for functional correctness and if you can test
it even better. I would like to push this in, especially if the %
performance difference I am seeing is reproducible elsewhere as well.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |   12 ++++++++++++
 mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
 mm/page_cgroup.c            |    1 -
 3 files changed, 50 insertions(+), 5 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..ebdae9a 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,8 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ROOT, /* page belongs to root cgroup */
+	PCG_ACCT, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
+SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
+SETPCGFLAG(Used, USED)
+
+SETPCGFLAG(Root, ROOT)
+CLEARPCGFLAG(Root, ROOT)
+TESTPCGFLAG(Root, ROOT)
+
+SETPCGFLAG(Acct, ACCT)
+CLEARPCGFLAG(Acct, ACCT)
+TESTPCGFLAG(Acct, ACCT)
 
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9712ef7..35415fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
@@ -196,6 +197,10 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
+/* Not used, but added here for completeness */
+#define PCGF_ROOT	(1UL << PCG_ROOT)
+#define PCGF_ACCT	(1UL << PCG_ACCT)
+
 static const unsigned long
 pcg_default_flags[NR_CHARGE_TYPE] = {
 	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
@@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
 		return;
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
@@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 	mz = page_cgroup_zoneinfo(pc);
 	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	ClearPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_del_init(&pc->lru);
 	return;
 }
@@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
-	pc->flags = pcg_default_flags[ctype];
+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}
+
+	if (mem == root_mem_cgroup)
+		SetPageCgroupRoot(pc);
 
 	mem_cgroup_charge_statistics(mem, pc, true);
 
@@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
+	if (mem == root_mem_cgroup)
+		ClearPageCgroupRoot(pc);
 	/*
 	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
@@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 09b73c5..6145ff6 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 
 #endif
 
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
 static DEFINE_MUTEX(swap_cgroup_mutex);
 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-17  4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh
@ 2009-06-01  4:25   ` Daisuke Nishimura
  2009-06-01  5:01     ` Daisuke Nishimura
  2009-06-01  5:49     ` Balbir Singh
  0 siblings, 2 replies; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-01  4:25 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Andrew Morton, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro, Daisuke Nishimura

I'm sorry for my very late reply.

I've been working on the stale swap cache problem for a long time as you know :)

On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> 
> > I think set/clear flag here adds race condtion....because pc->flags is
> > modfied by
> >   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> > you have to modify above lines to be
> > 
> >   SetPageCgroupCache(pc) or some..
> >   ...
> >   SetPageCgroupUsed(pc)
> > 
> > Then, you can use set_bit() without lock_page_cgroup().
> > (Currently, pc->flags is modified only under lock_page_cgroup(), so,
> >  non atomic code is used.)
> >
> 
> Here is the next version of the patch
> 
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
I agree to this idea itself.

> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> pcg_default_flags is now obsolete, but I've not removed it yet. It
> provides some readability to help the code.
> 
> Tests:
> 1. Tested lightly, previous versions showed good performance improvement 10%.
> 
You should test current version :)
And I think you should test this patch under global memory pressure too
to check whether it doesn't cause bug or under/over flow of something, etc.
memcg's LRU handling about SwapCache is different from usual one.

> NOTE:
> I haven't got the time right now to run oprofile and get detailed test results,
> since I am in the middle of travel.
> 
> Please review the code for functional correctness and if you can test
> it even better. I would like to push this in, especially if the %
> performance difference I am seeing is reproducible elsewhere as well.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |   12 ++++++++++++
>  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..ebdae9a 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT, /* page has been accounted for */
>  };
>  
Those new flags are protected by zone->lru_lock, right ?
If so, please add some comments.
And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ?

>  #define TESTPCGFLAG(uname, lname)			\
> @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(Acct, ACCT)
> +CLEARPCGFLAG(Acct, ACCT)
> +TESTPCGFLAG(Acct, ACCT)
>  
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9712ef7..35415fc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
> @@ -196,6 +197,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
Shouldn't we set PCG_LOCK ?
unlock_page_cgroup() will be called after this.

Moreover, IIUC, pc->flags is not cleared at page free/alloc, so if a page
is reused, pc->flags has the old value.
PCG_CACHE flag, at least, is used by the decision in mem_cgroup_charge_statistics().

> @@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> +	if (mem == root_mem_cgroup)
> +		ClearPageCgroupRoot(pc);
>  	/*
>  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
> @@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
It's a nitpick, I prefer not to show *.limit_in_bytes if we cannot write to them.


Thanks,
Daisuke Nishimura.

> @@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 09b73c5..6145ff6 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
>  
>  #endif
>  
> -
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  
>  static DEFINE_MUTEX(swap_cgroup_mutex);
>  
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-06-01  4:25   ` Daisuke Nishimura
@ 2009-06-01  5:01     ` Daisuke Nishimura
  2009-06-01  5:49     ` Balbir Singh
  1 sibling, 0 replies; 30+ messages in thread
From: Daisuke Nishimura @ 2009-06-01  5:01 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Andrew Morton, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro, Daisuke Nishimura

> > @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (mem == root_mem_cgroup)
> > +		SetPageCgroupRoot(pc);
> >  
> >  	mem_cgroup_charge_statistics(mem, pc, true);
> >  
> Shouldn't we set PCG_LOCK ?
> unlock_page_cgroup() will be called after this.
> 
Ah, lock_page_cgroup() has already set it.
please ignore this comment.

Sorry for noise.

Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-06-01  4:25   ` Daisuke Nishimura
  2009-06-01  5:01     ` Daisuke Nishimura
@ 2009-06-01  5:49     ` Balbir Singh
  1 sibling, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2009-06-01  5:49 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Andrew Morton, lizf@cn.fujitsu.com,
	menage@google.com, KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-01 13:25:05]:

> I'm sorry for my very late reply.
> 
> I've been working on the stale swap cache problem for a long time as you know :)
> 
> On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > I think set/clear flag here adds race condtion....because pc->flags is
> > > modfied by
> > >   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> > > you have to modify above lines to be
> > > 
> > >   SetPageCgroupCache(pc) or some..
> > >   ...
> > >   SetPageCgroupUsed(pc)
> > > 
> > > Then, you can use set_bit() without lock_page_cgroup().
> > > (Currently, pc->flags is modified only under lock_page_cgroup(), so,
> > >  non atomic code is used.)
> > >
> > 
> > Here is the next version of the patch
> > 
> > 
> > Feature: Remove the overhead associated with the root cgroup
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > This patch changes the memory cgroup and removes the overhead associated
> > with accounting all pages in the root cgroup. As a side-effect, we can
> > no longer set a memory hard limit in the root cgroup.
> > 
> I agree to this idea itself.
>

Thanks!
 
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well. Flags are now set atomically for page_cgroup,
> > pcg_default_flags is now obsolete, but I've not removed it yet. It
> > provides some readability to help the code.
> > 
> > Tests:
> > 1. Tested lightly, previous versions showed good performance improvement 10%.
> > 
> You should test current version :)
> And I think you should test this patch under global memory pressure too
> to check whether it doesn't cause bug or under/over flow of something, etc.
> memcg's LRU handling about SwapCache is different from usual one.
> 

OK, I've tested it using my stress tool, but I'll modify to add some
of the things you've pointed out.

> > NOTE:
> > I haven't got the time right now to run oprofile and get detailed test results,
> > since I am in the middle of travel.
> > 
> > Please review the code for functional correctness and if you can test
> > it even better. I would like to push this in, especially if the %
> > performance difference I am seeing is reproducible elsewhere as well.
> > 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> > 
> >  include/linux/page_cgroup.h |   12 ++++++++++++
> >  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 50 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..ebdae9a 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT, /* page has been accounted for */
> >  };
> >  
> Those new flags are protected by zone->lru_lock, right ?
> If so, please add some comments.
> And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ?
>

Nope.. the accounting is independent of charge/uncharge.
 
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-06-15  4:22 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-15 17:45 [RFC] Low overhead patches for the memory cgroup controller (v2) KAMEZAWA Hiroyuki
2009-05-15 18:16 ` Balbir Singh
2009-05-18 10:11   ` KAMEZAWA Hiroyuki
2009-05-18 10:45     ` Balbir Singh
2009-05-18 16:01       ` KAMEZAWA Hiroyuki
2009-05-19 13:18         ` Balbir Singh
2009-05-31 23:51     ` Balbir Singh
2009-06-01 23:57       ` KAMEZAWA Hiroyuki
2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
2009-06-05  5:51           ` KAMEZAWA Hiroyuki
2009-06-05  9:33             ` Balbir Singh
2009-06-08  0:20               ` Daisuke Nishimura
2009-06-05  6:05           ` Daisuke Nishimura
2009-06-05  9:47             ` Balbir Singh
2009-06-08  0:03               ` Daisuke Nishimura
2009-06-05  6:43           ` Daisuke Nishimura
2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
2009-06-15  2:04             ` KAMEZAWA Hiroyuki
2009-06-15  2:18             ` Daisuke Nishimura
2009-06-15  2:23               ` KAMEZAWA Hiroyuki
2009-06-15  2:44                 ` Balbir Singh
2009-06-15  3:00               ` Balbir Singh
2009-06-15  3:09                 ` Daisuke Nishimura
2009-06-15  3:22                   ` Balbir Singh
2009-06-15  3:46                     ` Daisuke Nishimura
2009-06-15  4:22                       ` Balbir Singh
2009-05-17  4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh
2009-06-01  4:25   ` Daisuke Nishimura
2009-06-01  5:01     ` Daisuke Nishimura
2009-06-01  5:49     ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).