[PATCH 0/2] memcg patches around event counting...softlimit and thresholds

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] memcg patches around event counting...softlimit and thresholds
@ 2010-02-12  6:44 KAMEZAWA Hiroyuki
  2010-02-12  6:47 ` [PATCH 1/2] memcg : update softlimit and threshold at commit KAMEZAWA Hiroyuki
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  6:44 UTC (permalink / raw)
  To: linux-mm@kvack.org
  Cc: Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

These 2 patches are updates for memcg's event counter.

Memcg has 2 counters but they counts the same thing. Just usages are
different from each other. This patch tries to combine them.

Event counting is done per page but event check is done per charge.
But, now, move_task at el. does charge() in batched manner. So, it's better
to do event check per page (not per charge.)

(*) There may be an opinion that threshold check should be done at charge().
    But, at charge(), event counter is not incremented, anyway.
    Then, some another idea is appreciated to check thresholds at charges.
    In other view, checking threshold at "precharge" can cause miss-fire of 
    event notifier. So, checking threshold at commit has some sense, I think.

I wonder I should add RFC..but this patch clears my concerns since memcg-threshold
was merged. So, I didn't.

Any comment is welcome. (I'm sorry if my reply is delayed.)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] memcg : update softlimit and threshold at commit.
  2010-02-12  6:44 [PATCH 0/2] memcg patches around event counting...softlimit and thresholds KAMEZAWA Hiroyuki
@ 2010-02-12  6:47 ` KAMEZAWA Hiroyuki
  2010-02-12  7:33   ` Daisuke Nishimura
  2010-02-12  6:48 ` [PATCH 2/2] memcg: share event counter rather than duplicate KAMEZAWA Hiroyuki
  2010-02-12  9:05 ` [PATCH 0/2] memcg patches around event counting...softlimit and thresholds v2 KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  6:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

Now, move_task introduced "batched" precharge. Because res_counter or css's refcnt
are not-scalable jobs for memcg, charge()s should be done in batched manner
if allowed.

Now, softlimit and threshold check their event counter in try_charge, but
this charge() is not per-page event. And event counter is not updated at charge().
Moreover, precharge doesn't pass "page" to try_charge() and softlimit tree
will be never updated until uncharge() causes an event.

So, the best place to check the event counter is commit_charge(). This is 
per-page event by its nature. This patch move checks to there.

Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
+++ mmotm-2.6.33-Feb10/mm/memcontrol.c
@@ -1463,7 +1463,7 @@ static int __mem_cgroup_try_charge(struc
 		unsigned long flags = 0;
 
 		if (consume_stock(mem))
-			goto charged;
+			goto done;
 
 		ret = res_counter_charge(&mem->res, csize, &fail_res);
 		if (likely(!ret)) {
@@ -1558,16 +1558,7 @@ static int __mem_cgroup_try_charge(struc
 	}
 	if (csize > PAGE_SIZE)
 		refill_stock(mem, csize - PAGE_SIZE);
-charged:
-	/*
-	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
-	 * if they exceeds softlimit.
-	 */
-	if (page && mem_cgroup_soft_limit_check(mem))
-		mem_cgroup_update_tree(mem, page);
 done:
-	if (mem_cgroup_threshold_check(mem))
-		mem_cgroup_threshold(mem);
 	return 0;
 nomem:
 	css_put(&mem->css);
@@ -1691,6 +1682,16 @@ static void __mem_cgroup_commit_charge(s
 	mem_cgroup_charge_statistics(mem, pc, true);
 
 	unlock_page_cgroup(pc);
+	/*
+	 * "charge_statistics" updated event counter. Then, check it.
+	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
+	 * if they exceeds softlimit.
+	 */
+	if (mem_cgroup_soft_limit_check(mem))
+		mem_cgroup_update_tree(mem, pc->page);
+	if (mem_cgroup_threshold_check(mem))
+		mem_cgroup_threshold(mem);
+
 }
 
 /**


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  6:44 [PATCH 0/2] memcg patches around event counting...softlimit and thresholds KAMEZAWA Hiroyuki
  2010-02-12  6:47 ` [PATCH 1/2] memcg : update softlimit and threshold at commit KAMEZAWA Hiroyuki
@ 2010-02-12  6:48 ` KAMEZAWA Hiroyuki
  2010-02-12  7:40   ` Daisuke Nishimura
                     ` (2 more replies)
  2010-02-12  9:05 ` [PATCH 0/2] memcg patches around event counting...softlimit and thresholds v2 KAMEZAWA Hiroyuki
  2 siblings, 3 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  6:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

Memcg has 2 eventcountes which counts "the same" event. Just usages are
different from each other. This patch tries to reduce event counter.

This patch's logic uses "only increment, no reset" new_counter and masks for each
checks. Softlimit chesk was done per 1000 events. So, the similar check
can be done by !(new_counter & 0x3ff). Threshold check was done per 100
events. So, the similar check can be done by (!new_counter & 0x7f)

Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   36 ++++++++++++------------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
+++ mmotm-2.6.33-Feb10/mm/memcontrol.c
@@ -63,8 +63,8 @@ static int really_do_swap_account __init
 #define do_swap_account		(0)
 #endif
 
-#define SOFTLIMIT_EVENTS_THRESH (1000)
-#define THRESHOLDS_EVENTS_THRESH (100)
+#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
+#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */
 
 /*
  * Statistics for memory cgroup.
@@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
 	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
 	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
-	MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
-					used by soft limit implementation */
-	MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
-					used by threshold implementation */
+	MEM_CGROUP_EVENTS,	/* incremented by 1 at pagein/pageout */
 
 	MEM_CGROUP_STAT_NSTATS,
 };
@@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
 
 static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
 {
-	bool ret = false;
 	s64 val;
 
-	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
-	if (unlikely(val < 0)) {
-		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
-				SOFTLIMIT_EVENTS_THRESH);
-		ret = true;
-	}
-	return ret;
+	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
+	if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
+		return true;
+	return false;
 }
 
 static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
@@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
 		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
 	else
 		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
-	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
-	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
+	__this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);
 
 	preempt_enable();
 }
@@ -3211,16 +3203,12 @@ static int mem_cgroup_swappiness_write(s
 
 static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
 {
-	bool ret = false;
 	s64 val;
 
-	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
-	if (unlikely(val < 0)) {
-		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
-				THRESHOLDS_EVENTS_THRESH);
-		ret = true;
-	}
-	return ret;
+	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
+	if (unlikely(!(val & THRESHOLDS_EVENTS_THRESH)))
+		return true;
+	return false;
 }
 
 static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] memcg : update softlimit and threshold at commit.
  2010-02-12  6:47 ` [PATCH 1/2] memcg : update softlimit and threshold at commit KAMEZAWA Hiroyuki
@ 2010-02-12  7:33   ` Daisuke Nishimura
  2010-02-12  7:42     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 21+ messages in thread
From: Daisuke Nishimura @ 2010-02-12  7:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	akpm@linux-foundation.org, Daisuke Nishimura

On Fri, 12 Feb 2010 15:47:13 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Now, move_task introduced "batched" precharge. Because res_counter or css's refcnt
> are not-scalable jobs for memcg, charge()s should be done in batched manner
> if allowed.
> 
> Now, softlimit and threshold check their event counter in try_charge, but
> this charge() is not per-page event. And event counter is not updated at charge().
> Moreover, precharge doesn't pass "page" to try_charge() and softlimit tree
> will be never updated until uncharge() causes an event.
> 
> So, the best place to check the event counter is commit_charge(). This is 
> per-page event by its nature. This patch move checks to there.
> 
I agree to this direction.

> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> @@ -1463,7 +1463,7 @@ static int __mem_cgroup_try_charge(struc
>  		unsigned long flags = 0;
>  
>  		if (consume_stock(mem))
> -			goto charged;
> +			goto done;
>  
>  		ret = res_counter_charge(&mem->res, csize, &fail_res);
>  		if (likely(!ret)) {
> @@ -1558,16 +1558,7 @@ static int __mem_cgroup_try_charge(struc
>  	}
>  	if (csize > PAGE_SIZE)
>  		refill_stock(mem, csize - PAGE_SIZE);
> -charged:
> -	/*
> -	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> -	 * if they exceeds softlimit.
> -	 */
> -	if (page && mem_cgroup_soft_limit_check(mem))
> -		mem_cgroup_update_tree(mem, page);
>  done:
> -	if (mem_cgroup_threshold_check(mem))
> -		mem_cgroup_threshold(mem);
>  	return 0;
>  nomem:
>  	css_put(&mem->css);
After this change, @page can be removed from the arg of try_charge().


Thanks,
Daisuke Nishimura.

> @@ -1691,6 +1682,16 @@ static void __mem_cgroup_commit_charge(s
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
>  	unlock_page_cgroup(pc);
> +	/*
> +	 * "charge_statistics" updated event counter. Then, check it.
> +	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> +	 * if they exceeds softlimit.
> +	 */
> +	if (mem_cgroup_soft_limit_check(mem))
> +		mem_cgroup_update_tree(mem, pc->page);
> +	if (mem_cgroup_threshold_check(mem))
> +		mem_cgroup_threshold(mem);
> +
>  }
>  
>  /**
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  6:48 ` [PATCH 2/2] memcg: share event counter rather than duplicate KAMEZAWA Hiroyuki
@ 2010-02-12  7:40   ` Daisuke Nishimura
  2010-02-12  7:41     ` KAMEZAWA Hiroyuki
  2010-02-12  7:46   ` Kirill A. Shutemov
  2010-02-12  8:07   ` Kirill A. Shutemov
  2 siblings, 1 reply; 21+ messages in thread
From: Daisuke Nishimura @ 2010-02-12  7:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	akpm@linux-foundation.org, Daisuke Nishimura

On Fri, 12 Feb 2010 15:48:57 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Memcg has 2 eventcountes which counts "the same" event. Just usages are
> different from each other. This patch tries to reduce event counter.
> 
> This patch's logic uses "only increment, no reset" new_counter and masks for each
> checks. Softlimit chesk was done per 1000 events. So, the similar check
> can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> events. So, the similar check can be done by (!new_counter & 0x7f)
> 
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   36 ++++++++++++------------------------
>  1 file changed, 12 insertions(+), 24 deletions(-)
> 
> Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> @@ -63,8 +63,8 @@ static int really_do_swap_account __init
>  #define do_swap_account		(0)
>  #endif
>  
> -#define SOFTLIMIT_EVENTS_THRESH (1000)
> -#define THRESHOLDS_EVENTS_THRESH (100)
> +#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
> +#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */
>  
>  /*
>   * Statistics for memory cgroup.
> @@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
>  	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
>  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
>  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> -	MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> -					used by soft limit implementation */
> -	MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> -					used by threshold implementation */
> +	MEM_CGROUP_EVENTS,	/* incremented by 1 at pagein/pageout */
>  
>  	MEM_CGROUP_STAT_NSTATS,
>  };
> @@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
>  
>  static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
>  {
> -	bool ret = false;
>  	s64 val;
>  
> -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -	if (unlikely(val < 0)) {
> -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> -				SOFTLIMIT_EVENTS_THRESH);
> -		ret = true;
> -	}
> -	return ret;
> +	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +	if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
> +		return true;
> +	return false;
>  }
>  
>  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> @@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
>  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
>  	else
>  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> +	__this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);
>
I think using __this_cpu_inc() would be more natural(and the patch description
says "increment" :)).

Thanks,
Daisuke Nishimura.

>  	preempt_enable();
>  }
> @@ -3211,16 +3203,12 @@ static int mem_cgroup_swappiness_write(s
>  
>  static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
>  {
> -	bool ret = false;
>  	s64 val;
>  
> -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> -	if (unlikely(val < 0)) {
> -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> -				THRESHOLDS_EVENTS_THRESH);
> -		ret = true;
> -	}
> -	return ret;
> +	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +	if (unlikely(!(val & THRESHOLDS_EVENTS_THRESH)))
> +		return true;
> +	return false;
>  }
>  
>  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  7:40   ` Daisuke Nishimura
@ 2010-02-12  7:41     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  7:41 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	akpm@linux-foundation.org

On Fri, 12 Feb 2010 16:40:18 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Fri, 12 Feb 2010 15:48:57 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Memcg has 2 eventcountes which counts "the same" event. Just usages are
> > different from each other. This patch tries to reduce event counter.
> > 
> > This patch's logic uses "only increment, no reset" new_counter and masks for each
> > checks. Softlimit chesk was done per 1000 events. So, the similar check
> > can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> > events. So, the similar check can be done by (!new_counter & 0x7f)
> > 
> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   36 ++++++++++++------------------------
> >  1 file changed, 12 insertions(+), 24 deletions(-)
> > 
> > Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> > @@ -63,8 +63,8 @@ static int really_do_swap_account __init
> >  #define do_swap_account		(0)
> >  #endif
> >  
> > -#define SOFTLIMIT_EVENTS_THRESH (1000)
> > -#define THRESHOLDS_EVENTS_THRESH (100)
> > +#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
> > +#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */
> >  
> >  /*
> >   * Statistics for memory cgroup.
> > @@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
> >  	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
> >  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
> >  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> > -	MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> > -					used by soft limit implementation */
> > -	MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> > -					used by threshold implementation */
> > +	MEM_CGROUP_EVENTS,	/* incremented by 1 at pagein/pageout */
> >  
> >  	MEM_CGROUP_STAT_NSTATS,
> >  };
> > @@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
> >  
> >  static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
> >  {
> > -	bool ret = false;
> >  	s64 val;
> >  
> > -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > -	if (unlikely(val < 0)) {
> > -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> > -				SOFTLIMIT_EVENTS_THRESH);
> > -		ret = true;
> > -	}
> > -	return ret;
> > +	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> > +	if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
> > +		return true;
> > +	return false;
> >  }
> >  
> >  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> > @@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
> >  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
> >  	else
> >  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> > -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > +	__this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);
> >
> I think using __this_cpu_inc() would be more natural(and the patch description
> says "increment" :)).
> 
yes...yes..will post v2.

Thanks,
-Kame


> Thanks,
> Daisuke Nishimura.
> 
> >  	preempt_enable();
> >  }
> > @@ -3211,16 +3203,12 @@ static int mem_cgroup_swappiness_write(s
> >  
> >  static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
> >  {
> > -	bool ret = false;
> >  	s64 val;
> >  
> > -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > -	if (unlikely(val < 0)) {
> > -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> > -				THRESHOLDS_EVENTS_THRESH);
> > -		ret = true;
> > -	}
> > -	return ret;
> > +	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> > +	if (unlikely(!(val & THRESHOLDS_EVENTS_THRESH)))
> > +		return true;
> > +	return false;
> >  }
> >  
> >  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] memcg : update softlimit and threshold at commit.
  2010-02-12  7:33   ` Daisuke Nishimura
@ 2010-02-12  7:42     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  7:42 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	akpm@linux-foundation.org

On Fri, 12 Feb 2010 16:33:11 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Fri, 12 Feb 2010 15:47:13 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Now, move_task introduced "batched" precharge. Because res_counter or css's refcnt
> > are not-scalable jobs for memcg, charge()s should be done in batched manner
> > if allowed.
> > 
> > Now, softlimit and threshold check their event counter in try_charge, but
> > this charge() is not per-page event. And event counter is not updated at charge().
> > Moreover, precharge doesn't pass "page" to try_charge() and softlimit tree
> > will be never updated until uncharge() causes an event.
> > 
> > So, the best place to check the event counter is commit_charge(). This is 
> > per-page event by its nature. This patch move checks to there.
> > 
> I agree to this direction.
> 
> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   23 ++++++++++++-----------
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> > 
> > Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> > @@ -1463,7 +1463,7 @@ static int __mem_cgroup_try_charge(struc
> >  		unsigned long flags = 0;
> >  
> >  		if (consume_stock(mem))
> > -			goto charged;
> > +			goto done;
> >  
> >  		ret = res_counter_charge(&mem->res, csize, &fail_res);
> >  		if (likely(!ret)) {
> > @@ -1558,16 +1558,7 @@ static int __mem_cgroup_try_charge(struc
> >  	}
> >  	if (csize > PAGE_SIZE)
> >  		refill_stock(mem, csize - PAGE_SIZE);
> > -charged:
> > -	/*
> > -	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> > -	 * if they exceeds softlimit.
> > -	 */
> > -	if (page && mem_cgroup_soft_limit_check(mem))
> > -		mem_cgroup_update_tree(mem, page);
> >  done:
> > -	if (mem_cgroup_threshold_check(mem))
> > -		mem_cgroup_threshold(mem);
> >  	return 0;
> >  nomem:
> >  	css_put(&mem->css);
> After this change, @page can be removed from the arg of try_charge().
> 
Ah, hmm. good point. Will update.

Thanks,
-Kame


> 
> Thanks,
> Daisuke Nishimura.
> 
> > @@ -1691,6 +1682,16 @@ static void __mem_cgroup_commit_charge(s
> >  	mem_cgroup_charge_statistics(mem, pc, true);
> >  
> >  	unlock_page_cgroup(pc);
> > +	/*
> > +	 * "charge_statistics" updated event counter. Then, check it.
> > +	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> > +	 * if they exceeds softlimit.
> > +	 */
> > +	if (mem_cgroup_soft_limit_check(mem))
> > +		mem_cgroup_update_tree(mem, pc->page);
> > +	if (mem_cgroup_threshold_check(mem))
> > +		mem_cgroup_threshold(mem);
> > +
> >  }
> >  
> >  /**
> > 
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  7:46   ` Kirill A. Shutemov
@ 2010-02-12  7:46     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  7:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, 12 Feb 2010 09:46:17 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 12, 2010 at 8:48 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Memcg has 2 eventcountes which counts "the same" event. Just usages are
> > different from each other. This patch tries to reduce event counter.
> >
> > This patch's logic uses "only increment, no reset" new_counter and masks for each
> > checks. Softlimit chesk was done per 1000 events. So, the similar check
> > can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> > events. So, the similar check can be done by (!new_counter & 0x7f)
> >
> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A mm/memcontrol.c | A  36 ++++++++++++------------------------
> > A 1 file changed, 12 insertions(+), 24 deletions(-)
> >
> > Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> > @@ -63,8 +63,8 @@ static int really_do_swap_account __init
> > A #define do_swap_account A  A  A  A  A  A  A  A (0)
> > A #endif
> >
> > -#define SOFTLIMIT_EVENTS_THRESH (1000)
> > -#define THRESHOLDS_EVENTS_THRESH (100)
> > +#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
> > +#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */
> 
> Probably, better to define it as power of two here. Like
> 
> #define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
> #define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
> 
> And change logic of checks accordingly. What do you think?
> 

Okay, maybe it's cleaner. I'll try that.


> > A /*
> > A * Statistics for memory cgroup.
> > @@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
> > A  A  A  A MEM_CGROUP_STAT_PGPGIN_COUNT, A  /* # of pages paged in */
> > A  A  A  A MEM_CGROUP_STAT_PGPGOUT_COUNT, A /* # of pages paged out */
> > A  A  A  A MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> > - A  A  A  MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  used by soft limit implementation */
> > - A  A  A  MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  used by threshold implementation */
> > + A  A  A  MEM_CGROUP_EVENTS, A  A  A /* incremented by 1 at pagein/pageout */
> >
> > A  A  A  A MEM_CGROUP_STAT_NSTATS,
> > A };
> > @@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
> >
> > A static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
> > A {
> > - A  A  A  bool ret = false;
> > A  A  A  A s64 val;
> >
> > - A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > - A  A  A  if (unlikely(val < 0)) {
> > - A  A  A  A  A  A  A  this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  SOFTLIMIT_EVENTS_THRESH);
> > - A  A  A  A  A  A  A  ret = true;
> > - A  A  A  }
> > - A  A  A  return ret;
> > + A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> > + A  A  A  if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
> > + A  A  A  A  A  A  A  return true;
> > + A  A  A  return false;
> > A }
> >
> > A static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> > @@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
> > A  A  A  A  A  A  A  A __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
> > A  A  A  A else
> > A  A  A  A  A  A  A  A __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> > - A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > - A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > + A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);
> 
> Decrement??
> 
my bug. I'll fix.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  6:48 ` [PATCH 2/2] memcg: share event counter rather than duplicate KAMEZAWA Hiroyuki
  2010-02-12  7:40   ` Daisuke Nishimura
@ 2010-02-12  7:46   ` Kirill A. Shutemov
  2010-02-12  7:46     ` KAMEZAWA Hiroyuki
  2010-02-12  8:07   ` Kirill A. Shutemov
  2 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2010-02-12  7:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, Feb 12, 2010 at 8:48 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Memcg has 2 eventcountes which counts "the same" event. Just usages are
> different from each other. This patch tries to reduce event counter.
>
> This patch's logic uses "only increment, no reset" new_counter and masks for each
> checks. Softlimit chesk was done per 1000 events. So, the similar check
> can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> events. So, the similar check can be done by (!new_counter & 0x7f)
>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   36 ++++++++++++------------------------
>  1 file changed, 12 insertions(+), 24 deletions(-)
>
> Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> @@ -63,8 +63,8 @@ static int really_do_swap_account __init
>  #define do_swap_account                (0)
>  #endif
>
> -#define SOFTLIMIT_EVENTS_THRESH (1000)
> -#define THRESHOLDS_EVENTS_THRESH (100)
> +#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
> +#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */

Probably, better to define it as power of two here. Like

#define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
#define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */

And change logic of checks accordingly. What do you think?

>  /*
>  * Statistics for memory cgroup.
> @@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
>        MEM_CGROUP_STAT_PGPGIN_COUNT,   /* # of pages paged in */
>        MEM_CGROUP_STAT_PGPGOUT_COUNT,  /* # of pages paged out */
>        MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> -       MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> -                                       used by soft limit implementation */
> -       MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> -                                       used by threshold implementation */
> +       MEM_CGROUP_EVENTS,      /* incremented by 1 at pagein/pageout */
>
>        MEM_CGROUP_STAT_NSTATS,
>  };
> @@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
>
>  static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
>  {
> -       bool ret = false;
>        s64 val;
>
> -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -       if (unlikely(val < 0)) {
> -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> -                               SOFTLIMIT_EVENTS_THRESH);
> -               ret = true;
> -       }
> -       return ret;
> +       val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +       if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
> +               return true;
> +       return false;
>  }
>
>  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> @@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
>                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
>        else
>                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> +       __this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);

Decrement??

>        preempt_enable();
>  }
> @@ -3211,16 +3203,12 @@ static int mem_cgroup_swappiness_write(s
>
>  static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
>  {
> -       bool ret = false;
>        s64 val;
>
> -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> -       if (unlikely(val < 0)) {
> -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> -                               THRESHOLDS_EVENTS_THRESH);
> -               ret = true;
> -       }
> -       return ret;
> +       val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +       if (unlikely(!(val & THRESHOLDS_EVENTS_THRESH)))
> +               return true;
> +       return false;
>  }
>
>  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  6:48 ` [PATCH 2/2] memcg: share event counter rather than duplicate KAMEZAWA Hiroyuki
  2010-02-12  7:40   ` Daisuke Nishimura
  2010-02-12  7:46   ` Kirill A. Shutemov
@ 2010-02-12  8:07   ` Kirill A. Shutemov
  2010-02-12  8:19     ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2010-02-12  8:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, Feb 12, 2010 at 8:48 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Memcg has 2 eventcountes which counts "the same" event. Just usages are
> different from each other. This patch tries to reduce event counter.
>
> This patch's logic uses "only increment, no reset" new_counter and masks for each
> checks. Softlimit chesk was done per 1000 events. So, the similar check
> can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> events. So, the similar check can be done by (!new_counter & 0x7f)

IIUC, with this change we have to check counter after each update,
since we check
for exact value. So we have to move checks to mem_cgroup_charge_statistics() or
call them after each statistics charging. I'm not sure how it affects
performance.

> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   36 ++++++++++++------------------------
>  1 file changed, 12 insertions(+), 24 deletions(-)
>
> Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> @@ -63,8 +63,8 @@ static int really_do_swap_account __init
>  #define do_swap_account                (0)
>  #endif
>
> -#define SOFTLIMIT_EVENTS_THRESH (1000)
> -#define THRESHOLDS_EVENTS_THRESH (100)
> +#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
> +#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */
>
>  /*
>  * Statistics for memory cgroup.
> @@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
>        MEM_CGROUP_STAT_PGPGIN_COUNT,   /* # of pages paged in */
>        MEM_CGROUP_STAT_PGPGOUT_COUNT,  /* # of pages paged out */
>        MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> -       MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> -                                       used by soft limit implementation */
> -       MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> -                                       used by threshold implementation */
> +       MEM_CGROUP_EVENTS,      /* incremented by 1 at pagein/pageout */
>
>        MEM_CGROUP_STAT_NSTATS,
>  };
> @@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
>
>  static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
>  {
> -       bool ret = false;
>        s64 val;
>
> -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -       if (unlikely(val < 0)) {
> -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> -                               SOFTLIMIT_EVENTS_THRESH);
> -               ret = true;
> -       }
> -       return ret;
> +       val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +       if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
> +               return true;
> +       return false;
>  }
>
>  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> @@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
>                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
>        else
>                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> +       __this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);
>
>        preempt_enable();
>  }
> @@ -3211,16 +3203,12 @@ static int mem_cgroup_swappiness_write(s
>
>  static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
>  {
> -       bool ret = false;
>        s64 val;
>
> -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> -       if (unlikely(val < 0)) {
> -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> -                               THRESHOLDS_EVENTS_THRESH);
> -               ret = true;
> -       }
> -       return ret;
> +       val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +       if (unlikely(!(val & THRESHOLDS_EVENTS_THRESH)))
> +               return true;
> +       return false;
>  }
>
>  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  8:07   ` Kirill A. Shutemov
@ 2010-02-12  8:19     ` KAMEZAWA Hiroyuki
  2010-02-12  8:49       ` Kirill A. Shutemov
  0 siblings, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  8:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, 12 Feb 2010 10:07:25 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 12, 2010 at 8:48 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Memcg has 2 eventcountes which counts "the same" event. Just usages are
> > different from each other. This patch tries to reduce event counter.
> >
> > This patch's logic uses "only increment, no reset" new_counter and masks for each
> > checks. Softlimit chesk was done per 1000 events. So, the similar check
> > can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> > events. So, the similar check can be done by (!new_counter & 0x7f)
> 
> IIUC, with this change we have to check counter after each update,
> since we check
> for exact value. 

Yes. 
> So we have to move checks to mem_cgroup_charge_statistics() or
> call them after each statistics charging. I'm not sure how it affects
> performance.
> 

My patch 1/2 does it.

But hmm, move-task does counter updates in asynchronous manner. Then, there are
bug. I'll add check in the next version.

Maybe calling update_tree and threshold_check at the end of mova_task is
better. Does thresholds user take care of batched-move manner in task_move ?
Should we check one by one ?
(Maybe there will be another trouble when we handle hugepages...)

Thanks,
-Kame


> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A mm/memcontrol.c | A  36 ++++++++++++------------------------
> > A 1 file changed, 12 insertions(+), 24 deletions(-)
> >
> > Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> > @@ -63,8 +63,8 @@ static int really_do_swap_account __init
> > A #define do_swap_account A  A  A  A  A  A  A  A (0)
> > A #endif
> >
> > -#define SOFTLIMIT_EVENTS_THRESH (1000)
> > -#define THRESHOLDS_EVENTS_THRESH (100)
> > +#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
> > +#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */
> >
> > A /*
> > A * Statistics for memory cgroup.
> > @@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
> > A  A  A  A MEM_CGROUP_STAT_PGPGIN_COUNT, A  /* # of pages paged in */
> > A  A  A  A MEM_CGROUP_STAT_PGPGOUT_COUNT, A /* # of pages paged out */
> > A  A  A  A MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> > - A  A  A  MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  used by soft limit implementation */
> > - A  A  A  MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  used by threshold implementation */
> > + A  A  A  MEM_CGROUP_EVENTS, A  A  A /* incremented by 1 at pagein/pageout */
> >
> > A  A  A  A MEM_CGROUP_STAT_NSTATS,
> > A };
> > @@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
> >
> > A static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
> > A {
> > - A  A  A  bool ret = false;
> > A  A  A  A s64 val;
> >
> > - A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > - A  A  A  if (unlikely(val < 0)) {
> > - A  A  A  A  A  A  A  this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  SOFTLIMIT_EVENTS_THRESH);
> > - A  A  A  A  A  A  A  ret = true;
> > - A  A  A  }
> > - A  A  A  return ret;
> > + A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> > + A  A  A  if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
> > + A  A  A  A  A  A  A  return true;
> > + A  A  A  return false;
> > A }
> >
> > A static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> > @@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
> > A  A  A  A  A  A  A  A __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
> > A  A  A  A else
> > A  A  A  A  A  A  A  A __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> > - A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > - A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > + A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);
> >
> > A  A  A  A preempt_enable();
> > A }
> > @@ -3211,16 +3203,12 @@ static int mem_cgroup_swappiness_write(s
> >
> > A static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
> > A {
> > - A  A  A  bool ret = false;
> > A  A  A  A s64 val;
> >
> > - A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > - A  A  A  if (unlikely(val < 0)) {
> > - A  A  A  A  A  A  A  this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  THRESHOLDS_EVENTS_THRESH);
> > - A  A  A  A  A  A  A  ret = true;
> > - A  A  A  }
> > - A  A  A  return ret;
> > + A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> > + A  A  A  if (unlikely(!(val & THRESHOLDS_EVENTS_THRESH)))
> > + A  A  A  A  A  A  A  return true;
> > + A  A  A  return false;
> > A }
> >
> > A static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
> >
> >
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  8:19     ` KAMEZAWA Hiroyuki
@ 2010-02-12  8:49       ` Kirill A. Shutemov
  2010-02-12  8:51         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2010-02-12  8:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, Feb 12, 2010 at 10:19 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 12 Feb 2010 10:07:25 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
>> On Fri, Feb 12, 2010 at 8:48 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > Memcg has 2 eventcountes which counts "the same" event. Just usages are
>> > different from each other. This patch tries to reduce event counter.
>> >
>> > This patch's logic uses "only increment, no reset" new_counter and masks for each
>> > checks. Softlimit chesk was done per 1000 events. So, the similar check
>> > can be done by !(new_counter & 0x3ff). Threshold check was done per 100
>> > events. So, the similar check can be done by (!new_counter & 0x7f)
>>
>> IIUC, with this change we have to check counter after each update,
>> since we check
>> for exact value.
>
> Yes.
>> So we have to move checks to mem_cgroup_charge_statistics() or
>> call them after each statistics charging. I'm not sure how it affects
>> performance.
>>
>
> My patch 1/2 does it.
>
> But hmm, move-task does counter updates in asynchronous manner. Then, there are
> bug. I'll add check in the next version.
>
> Maybe calling update_tree and threshold_check at the end of mova_task is
> better. Does thresholds user take care of batched-move manner in task_move ?
> Should we check one by one ?

No. mem_cgroup_threshold() at mem_cgroup_move_task() is enough.

But... Is task moving a critical path? If no, It's, probably, cleaner to check
everything at mem_cgroup_charge_statistics().

> (Maybe there will be another trouble when we handle hugepages...)

Yes, hugepages support requires more testing.

> Thanks,
> -Kame
>
>
>> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
>> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > ---
>> >  mm/memcontrol.c |   36 ++++++++++++------------------------
>> >  1 file changed, 12 insertions(+), 24 deletions(-)
>> >
>> > Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
>> > ===================================================================
>> > --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
>> > +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
>> > @@ -63,8 +63,8 @@ static int really_do_swap_account __init
>> >  #define do_swap_account                (0)
>> >  #endif
>> >
>> > -#define SOFTLIMIT_EVENTS_THRESH (1000)
>> > -#define THRESHOLDS_EVENTS_THRESH (100)
>> > +#define SOFTLIMIT_EVENTS_THRESH (0x3ff) /* once in 1024 */
>> > +#define THRESHOLDS_EVENTS_THRESH (0x7f) /* once in 128 */
>> >
>> >  /*
>> >  * Statistics for memory cgroup.
>> > @@ -79,10 +79,7 @@ enum mem_cgroup_stat_index {
>> >        MEM_CGROUP_STAT_PGPGIN_COUNT,   /* # of pages paged in */
>> >        MEM_CGROUP_STAT_PGPGOUT_COUNT,  /* # of pages paged out */
>> >        MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>> > -       MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
>> > -                                       used by soft limit implementation */
>> > -       MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
>> > -                                       used by threshold implementation */
>> > +       MEM_CGROUP_EVENTS,      /* incremented by 1 at pagein/pageout */
>> >
>> >        MEM_CGROUP_STAT_NSTATS,
>> >  };
>> > @@ -394,16 +391,12 @@ mem_cgroup_remove_exceeded(struct mem_cg
>> >
>> >  static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
>> >  {
>> > -       bool ret = false;
>> >        s64 val;
>> >
>> > -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
>> > -       if (unlikely(val < 0)) {
>> > -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
>> > -                               SOFTLIMIT_EVENTS_THRESH);
>> > -               ret = true;
>> > -       }
>> > -       return ret;
>> > +       val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
>> > +       if (unlikely(!(val & SOFTLIMIT_EVENTS_THRESH)))
>> > +               return true;
>> > +       return false;
>> >  }
>> >
>> >  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
>> > @@ -542,8 +535,7 @@ static void mem_cgroup_charge_statistics
>> >                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
>> >        else
>> >                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
>> > -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
>> > -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
>> > +       __this_cpu_dec(mem->stat->count[MEM_CGROUP_EVENTS]);
>> >
>> >        preempt_enable();
>> >  }
>> > @@ -3211,16 +3203,12 @@ static int mem_cgroup_swappiness_write(s
>> >
>> >  static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
>> >  {
>> > -       bool ret = false;
>> >        s64 val;
>> >
>> > -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
>> > -       if (unlikely(val < 0)) {
>> > -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
>> > -                               THRESHOLDS_EVENTS_THRESH);
>> > -               ret = true;
>> > -       }
>> > -       return ret;
>> > +       val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
>> > +       if (unlikely(!(val & THRESHOLDS_EVENTS_THRESH)))
>> > +               return true;
>> > +       return false;
>> >  }
>> >
>> >  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
>> >
>> >
>>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg: share event counter rather than duplicate
  2010-02-12  8:49       ` Kirill A. Shutemov
@ 2010-02-12  8:51         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  8:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, 12 Feb 2010 10:49:45 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 12, 2010 at 10:19 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Fri, 12 Feb 2010 10:07:25 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >
> >> On Fri, Feb 12, 2010 at 8:48 AM, KAMEZAWA Hiroyuki
> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> > Memcg has 2 eventcountes which counts "the same" event. Just usages are
> >> > different from each other. This patch tries to reduce event counter.
> >> >
> >> > This patch's logic uses "only increment, no reset" new_counter and masks for each
> >> > checks. Softlimit chesk was done per 1000 events. So, the similar check
> >> > can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> >> > events. So, the similar check can be done by (!new_counter & 0x7f)
> >>
> >> IIUC, with this change we have to check counter after each update,
> >> since we check
> >> for exact value.
> >
> > Yes.
> >> So we have to move checks to mem_cgroup_charge_statistics() or
> >> call them after each statistics charging. I'm not sure how it affects
> >> performance.
> >>
> >
> > My patch 1/2 does it.
> >
> > But hmm, move-task does counter updates in asynchronous manner. Then, there are
> > bug. I'll add check in the next version.
> >
> > Maybe calling update_tree and threshold_check at the end of mova_task is
> > better. Does thresholds user take care of batched-move manner in task_move ?
> > Should we check one by one ?
> 
> No. mem_cgroup_threshold() at mem_cgroup_move_task() is enough.
> 
> But... Is task moving a critical path? If no, It's, probably, cleaner to check
> everything at mem_cgroup_charge_statistics().
> 
The trouble is charge_statistics() is called under lock_page_cgroup() and 
I don't want to call something heavy under it.
(And I'm not very sure calling charge_statitics it without lock-page-cgroup is
 dangerous or not. (I think it has some race.)
 But if there is race, it's very difficult one. So, I leave it as it is.)

Maybe, my next one will be enough simple one. Thank you for review.

Regards,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/2] memcg patches around event counting...softlimit and thresholds v2
  2010-02-12  6:44 [PATCH 0/2] memcg patches around event counting...softlimit and thresholds KAMEZAWA Hiroyuki
  2010-02-12  6:47 ` [PATCH 1/2] memcg : update softlimit and threshold at commit KAMEZAWA Hiroyuki
  2010-02-12  6:48 ` [PATCH 2/2] memcg: share event counter rather than duplicate KAMEZAWA Hiroyuki
@ 2010-02-12  9:05 ` KAMEZAWA Hiroyuki
  2010-02-12  9:06   ` [PATCH 1/2] memcg: update threshold and softlimit at commit v2 KAMEZAWA Hiroyuki
  2010-02-12  9:09   ` [PATCH 2/2] memcg : share event counter rather than duplicate v2 KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  9:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

Thank you for review. This is v2.

Thanks,
-Kame


On Fri, 12 Feb 2010 15:44:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> These 2 patches are updates for memcg's event counter.
> 
> Memcg has 2 counters but they counts the same thing. Just usages are
> different from each other. This patch tries to combine them.
> 
> Event counting is done per page but event check is done per charge.
> But, now, move_task at el. does charge() in batched manner. So, it's better
> to do event check per page (not per charge.)
> 
> (*) There may be an opinion that threshold check should be done at charge().
>     But, at charge(), event counter is not incremented, anyway.
>     Then, some another idea is appreciated to check thresholds at charges.
>     In other view, checking threshold at "precharge" can cause miss-fire of 
>     event notifier. So, checking threshold at commit has some sense, I think.
> 
> I wonder I should add RFC..but this patch clears my concerns since memcg-threshold
> was merged. So, I didn't.
> 
> Any comment is welcome. (I'm sorry if my reply is delayed.)
> 
> Thanks,
> -Kame
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] memcg: update threshold and softlimit at commit v2
  2010-02-12  9:05 ` [PATCH 0/2] memcg patches around event counting...softlimit and thresholds v2 KAMEZAWA Hiroyuki
@ 2010-02-12  9:06   ` KAMEZAWA Hiroyuki
  2010-02-12  9:09   ` [PATCH 2/2] memcg : share event counter rather than duplicate v2 KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  9:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

Now, move_task does "batched" precharge. Because res_counter or css's refcnt
are not-scalable jobs for memcg, try_charge_().. tend to be done in batched
manner if allowed.

Now, softlimit and threshold check their event counter in try_charge, but
charge is not per-page event. And event counter is not updated at charge().
Moreover, precharge doesn't pass "page" to try_charge() and softlimit tree
will be never updated until uncharge() causes an event."

So, the best place to check the event counter is commit_charge(). This is 
per-page event by its nature. This patch move checks to there.

Changelog: 2010/02/12
 removed an argument "page" from try_charge(). After this, try_charge()
 is independent from what the page is.
 (Maybe transparent hugepage or some needs to add some argument in future.)

Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   38 ++++++++++++++++++--------------------
 1 file changed, 18 insertions(+), 20 deletions(-)

Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
+++ mmotm-2.6.33-Feb10/mm/memcontrol.c
@@ -1424,8 +1424,7 @@ static int __cpuinit memcg_stock_cpu_cal
  * oom-killer can be invoked.
  */
 static int __mem_cgroup_try_charge(struct mm_struct *mm,
-			gfp_t gfp_mask, struct mem_cgroup **memcg,
-			bool oom, struct page *page)
+			gfp_t gfp_mask, struct mem_cgroup **memcg, bool oom)
 {
 	struct mem_cgroup *mem, *mem_over_limit;
 	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
@@ -1463,7 +1462,7 @@ static int __mem_cgroup_try_charge(struc
 		unsigned long flags = 0;
 
 		if (consume_stock(mem))
-			goto charged;
+			goto done;
 
 		ret = res_counter_charge(&mem->res, csize, &fail_res);
 		if (likely(!ret)) {
@@ -1558,16 +1557,7 @@ static int __mem_cgroup_try_charge(struc
 	}
 	if (csize > PAGE_SIZE)
 		refill_stock(mem, csize - PAGE_SIZE);
-charged:
-	/*
-	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
-	 * if they exceeds softlimit.
-	 */
-	if (page && mem_cgroup_soft_limit_check(mem))
-		mem_cgroup_update_tree(mem, page);
 done:
-	if (mem_cgroup_threshold_check(mem))
-		mem_cgroup_threshold(mem);
 	return 0;
 nomem:
 	css_put(&mem->css);
@@ -1691,6 +1681,16 @@ static void __mem_cgroup_commit_charge(s
 	mem_cgroup_charge_statistics(mem, pc, true);
 
 	unlock_page_cgroup(pc);
+	/*
+	 * "charge_statistics" updated event counter. Then, check it.
+	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
+	 * if they exceeds softlimit.
+	 */
+	if (mem_cgroup_soft_limit_check(mem))
+		mem_cgroup_update_tree(mem, pc->page);
+	if (mem_cgroup_threshold_check(mem))
+		mem_cgroup_threshold(mem);
+
 }
 
 /**
@@ -1788,7 +1788,7 @@ static int mem_cgroup_move_parent(struct
 		goto put;
 
 	parent = mem_cgroup_from_cont(pcg);
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, page);
+	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false);
 	if (ret || !parent)
 		goto put_back;
 
@@ -1824,7 +1824,7 @@ static int mem_cgroup_charge_common(stru
 	prefetchw(pc);
 
 	mem = memcg;
-	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, true, page);
+	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, true);
 	if (ret || !mem)
 		return ret;
 
@@ -1944,14 +1944,14 @@ int mem_cgroup_try_charge_swapin(struct 
 	if (!mem)
 		goto charge_cur_mm;
 	*ptr = mem;
-	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true, page);
+	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true);
 	/* drop extra refcnt from tryget */
 	css_put(&mem->css);
 	return ret;
 charge_cur_mm:
 	if (unlikely(!mm))
 		mm = &init_mm;
-	return __mem_cgroup_try_charge(mm, mask, ptr, true, page);
+	return __mem_cgroup_try_charge(mm, mask, ptr, true);
 }
 
 static void
@@ -2340,8 +2340,7 @@ int mem_cgroup_prepare_migration(struct 
 	unlock_page_cgroup(pc);
 
 	if (mem) {
-		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false,
-						page);
+		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
 		css_put(&mem->css);
 	}
 	*ptr = mem;
@@ -3863,8 +3862,7 @@ one_by_one:
 			batch_count = PRECHARGE_COUNT_AT_ONCE;
 			cond_resched();
 		}
-		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem,
-								false, NULL);
+		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
 		if (ret || !mem)
 			/* mem_cgroup_clear_mc() will do uncharge later */
 			return -ENOMEM;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/2] memcg : share event counter rather than duplicate v2
  2010-02-12  9:05 ` [PATCH 0/2] memcg patches around event counting...softlimit and thresholds v2 KAMEZAWA Hiroyuki
  2010-02-12  9:06   ` [PATCH 1/2] memcg: update threshold and softlimit at commit v2 KAMEZAWA Hiroyuki
@ 2010-02-12  9:09   ` KAMEZAWA Hiroyuki
  2010-02-12 11:48     ` Daisuke Nishimura
  2010-02-15 10:57     ` Kirill A. Shutemov
  1 sibling, 2 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-12  9:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

Memcg has 2 eventcountes which counts "the same" event. Just usages are
different from each other. This patch tries to reduce event counter.

Now logic uses "only increment, no reset" counter and masks for each
checks. Softlimit chesk was done per 1000 evetns. So, the similar check
can be done by !(new_counter & 0x3ff). Threshold check was done per 100
events. So, the similar check can be done by (!new_counter & 0x7f)

ALL event checks are done right after EVENT percpu counter is updated.

Changelog: 2010/02/12
 - fixed to use "inc" rather than "dec"
 - modified to be more unified style of counter handling.
 - taking care of account-move.

Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   86 ++++++++++++++++++++++++++------------------------------
 1 file changed, 41 insertions(+), 45 deletions(-)

Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
+++ mmotm-2.6.33-Feb10/mm/memcontrol.c
@@ -63,8 +63,15 @@ static int really_do_swap_account __init
 #define do_swap_account		(0)
 #endif
 
-#define SOFTLIMIT_EVENTS_THRESH (1000)
-#define THRESHOLDS_EVENTS_THRESH (100)
+/*
+ * Per memcg event counter is incremented at every pagein/pageout. This counter
+ * is used for trigger some periodic events. This is straightforward and better
+ * than using jiffies etc. to handle periodic memcg event.
+ *
+ * These values will be used as !((event) & ((1 <<(thresh)) - 1))
+ */
+#define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
+#define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
 
 /*
  * Statistics for memory cgroup.
@@ -79,10 +86,7 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
 	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
 	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
-	MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
-					used by soft limit implementation */
-	MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
-					used by threshold implementation */
+	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
 
 	MEM_CGROUP_STAT_NSTATS,
 };
@@ -154,7 +158,6 @@ struct mem_cgroup_threshold_ary {
 	struct mem_cgroup_threshold entries[0];
 };
 
-static bool mem_cgroup_threshold_check(struct mem_cgroup *mem);
 static void mem_cgroup_threshold(struct mem_cgroup *mem);
 
 /*
@@ -392,19 +395,6 @@ mem_cgroup_remove_exceeded(struct mem_cg
 	spin_unlock(&mctz->lock);
 }
 
-static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
-{
-	bool ret = false;
-	s64 val;
-
-	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
-	if (unlikely(val < 0)) {
-		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
-				SOFTLIMIT_EVENTS_THRESH);
-		ret = true;
-	}
-	return ret;
-}
 
 static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
 {
@@ -542,8 +532,7 @@ static void mem_cgroup_charge_statistics
 		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
 	else
 		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
-	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
-	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
+	__this_cpu_inc(mem->stat->count[MEM_CGROUP_EVENTS]);
 
 	preempt_enable();
 }
@@ -563,6 +552,29 @@ static unsigned long mem_cgroup_get_loca
 	return total;
 }
 
+static bool __memcg_event_check(struct mem_cgroup *mem, int event_mask_shift)
+{
+	s64 val;
+
+	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
+
+	return !(val & ((1 << event_mask_shift) - 1));
+}
+
+/*
+ * Check events in order.
+ *
+ */
+static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
+{
+	/* threshold event is triggered in finer grain than soft limit */
+	if (unlikely(__memcg_event_check(mem, THRESHOLDS_EVENTS_THRESH))) {
+		mem_cgroup_threshold(mem);
+		if (unlikely(__memcg_event_check(mem, SOFTLIMIT_EVENTS_THRESH)))
+			mem_cgroup_update_tree(mem, page);
+	}
+}
+
 static struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
 {
 	return container_of(cgroup_subsys_state(cont,
@@ -1686,11 +1698,7 @@ static void __mem_cgroup_commit_charge(s
 	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
 	 * if they exceeds softlimit.
 	 */
-	if (mem_cgroup_soft_limit_check(mem))
-		mem_cgroup_update_tree(mem, pc->page);
-	if (mem_cgroup_threshold_check(mem))
-		mem_cgroup_threshold(mem);
-
+	memcg_check_events(mem, pc->page);
 }
 
 /**
@@ -1760,6 +1768,11 @@ static int mem_cgroup_move_account(struc
 		ret = 0;
 	}
 	unlock_page_cgroup(pc);
+	/*
+	 * check events
+	 */
+	memcg_check_events(to, pc->page);
+	memcg_check_events(from, pc->page);
 	return ret;
 }
 
@@ -2128,10 +2141,7 @@ __mem_cgroup_uncharge_common(struct page
 	mz = page_cgroup_zoneinfo(pc);
 	unlock_page_cgroup(pc);
 
-	if (mem_cgroup_soft_limit_check(mem))
-		mem_cgroup_update_tree(mem, page);
-	if (mem_cgroup_threshold_check(mem))
-		mem_cgroup_threshold(mem);
+	memcg_check_events(mem, page);
 	/* at swapout, this memcg will be accessed to record to swap */
 	if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
 		css_put(&mem->css);
@@ -3207,20 +3217,6 @@ static int mem_cgroup_swappiness_write(s
 	return 0;
 }
 
-static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
-{
-	bool ret = false;
-	s64 val;
-
-	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
-	if (unlikely(val < 0)) {
-		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
-				THRESHOLDS_EVENTS_THRESH);
-		ret = true;
-	}
-	return ret;
-}
-
 static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
 {
 	struct mem_cgroup_threshold_ary *t;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg : share event counter rather than duplicate v2
  2010-02-12  9:09   ` [PATCH 2/2] memcg : share event counter rather than duplicate v2 KAMEZAWA Hiroyuki
@ 2010-02-12 11:48     ` Daisuke Nishimura
  2010-02-15  0:19       ` KAMEZAWA Hiroyuki
  2010-02-15 10:57     ` Kirill A. Shutemov
  1 sibling, 1 reply; 21+ messages in thread
From: Daisuke Nishimura @ 2010-02-12 11:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, Kirill A. Shutemov, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, 12 Feb 2010 18:09:52 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Memcg has 2 eventcountes which counts "the same" event. Just usages are
> different from each other. This patch tries to reduce event counter.
> 
> Now logic uses "only increment, no reset" counter and masks for each
> checks. Softlimit chesk was done per 1000 evetns. So, the similar check
> can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> events. So, the similar check can be done by (!new_counter & 0x7f)
> 
> ALL event checks are done right after EVENT percpu counter is updated.
> 
> Changelog: 2010/02/12
>  - fixed to use "inc" rather than "dec"
>  - modified to be more unified style of counter handling.
>  - taking care of account-move.
> 
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   86 ++++++++++++++++++++++++++------------------------------
>  1 file changed, 41 insertions(+), 45 deletions(-)
> 
> Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> @@ -63,8 +63,15 @@ static int really_do_swap_account __init
>  #define do_swap_account		(0)
>  #endif
>  
> -#define SOFTLIMIT_EVENTS_THRESH (1000)
> -#define THRESHOLDS_EVENTS_THRESH (100)
> +/*
> + * Per memcg event counter is incremented at every pagein/pageout. This counter
> + * is used for trigger some periodic events. This is straightforward and better
> + * than using jiffies etc. to handle periodic memcg event.
> + *
> + * These values will be used as !((event) & ((1 <<(thresh)) - 1))
> + */
> +#define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
> +#define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
>  
>  /*
>   * Statistics for memory cgroup.
> @@ -79,10 +86,7 @@ enum mem_cgroup_stat_index {
>  	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
>  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
>  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> -	MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> -					used by soft limit implementation */
> -	MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> -					used by threshold implementation */
> +	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
>  
>  	MEM_CGROUP_STAT_NSTATS,
>  };
> @@ -154,7 +158,6 @@ struct mem_cgroup_threshold_ary {
>  	struct mem_cgroup_threshold entries[0];
>  };
>  
> -static bool mem_cgroup_threshold_check(struct mem_cgroup *mem);
>  static void mem_cgroup_threshold(struct mem_cgroup *mem);
>  
>  /*
> @@ -392,19 +395,6 @@ mem_cgroup_remove_exceeded(struct mem_cg
>  	spin_unlock(&mctz->lock);
>  }
>  
> -static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
> -{
> -	bool ret = false;
> -	s64 val;
> -
> -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -	if (unlikely(val < 0)) {
> -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> -				SOFTLIMIT_EVENTS_THRESH);
> -		ret = true;
> -	}
> -	return ret;
> -}
>  
>  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
>  {
> @@ -542,8 +532,7 @@ static void mem_cgroup_charge_statistics
>  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
>  	else
>  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> +	__this_cpu_inc(mem->stat->count[MEM_CGROUP_EVENTS]);
>  
>  	preempt_enable();
>  }
> @@ -563,6 +552,29 @@ static unsigned long mem_cgroup_get_loca
>  	return total;
>  }
>  
> +static bool __memcg_event_check(struct mem_cgroup *mem, int event_mask_shift)
> +{
> +	s64 val;
> +
> +	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +
> +	return !(val & ((1 << event_mask_shift) - 1));
> +}
> +
> +/*
> + * Check events in order.
> + *
> + */
> +static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
> +{
> +	/* threshold event is triggered in finer grain than soft limit */
> +	if (unlikely(__memcg_event_check(mem, THRESHOLDS_EVENTS_THRESH))) {
> +		mem_cgroup_threshold(mem);
> +		if (unlikely(__memcg_event_check(mem, SOFTLIMIT_EVENTS_THRESH)))
> +			mem_cgroup_update_tree(mem, page);
> +	}
> +}
> +
>  static struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
>  {
>  	return container_of(cgroup_subsys_state(cont,
> @@ -1686,11 +1698,7 @@ static void __mem_cgroup_commit_charge(s
>  	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
>  	 * if they exceeds softlimit.
>  	 */
> -	if (mem_cgroup_soft_limit_check(mem))
> -		mem_cgroup_update_tree(mem, pc->page);
> -	if (mem_cgroup_threshold_check(mem))
> -		mem_cgroup_threshold(mem);
> -
> +	memcg_check_events(mem, pc->page);
>  }
>  
>  /**
> @@ -1760,6 +1768,11 @@ static int mem_cgroup_move_account(struc
>  		ret = 0;
>  	}
>  	unlock_page_cgroup(pc);
> +	/*
> +	 * check events
> +	 */
> +	memcg_check_events(to, pc->page);
> +	memcg_check_events(from, pc->page);
>  	return ret;
>  }
>  
Strictly speaking, "if (!ret)" would be needed(it's not a big deal, though).

Thanks,
Daisuke Nishimura.

> @@ -2128,10 +2141,7 @@ __mem_cgroup_uncharge_common(struct page
>  	mz = page_cgroup_zoneinfo(pc);
>  	unlock_page_cgroup(pc);
>  
> -	if (mem_cgroup_soft_limit_check(mem))
> -		mem_cgroup_update_tree(mem, page);
> -	if (mem_cgroup_threshold_check(mem))
> -		mem_cgroup_threshold(mem);
> +	memcg_check_events(mem, page);
>  	/* at swapout, this memcg will be accessed to record to swap */
>  	if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
>  		css_put(&mem->css);
> @@ -3207,20 +3217,6 @@ static int mem_cgroup_swappiness_write(s
>  	return 0;
>  }
>  
> -static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
> -{
> -	bool ret = false;
> -	s64 val;
> -
> -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> -	if (unlikely(val < 0)) {
> -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> -				THRESHOLDS_EVENTS_THRESH);
> -		ret = true;
> -	}
> -	return ret;
> -}
> -
>  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
>  {
>  	struct mem_cgroup_threshold_ary *t;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg : share event counter rather than duplicate v2
  2010-02-12 11:48     ` Daisuke Nishimura
@ 2010-02-15  0:19       ` KAMEZAWA Hiroyuki
  2010-03-09 23:15         ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-15  0:19 UTC (permalink / raw)
  To: nishimura
  Cc: Daisuke Nishimura, linux-mm@kvack.org, Kirill A. Shutemov,
	balbir@linux.vnet.ibm.com, akpm@linux-foundation.org

On Fri, 12 Feb 2010 20:48:10 +0900
Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp> wrote:

> On Fri, 12 Feb 2010 18:09:52 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > Memcg has 2 eventcountes which counts "the same" event. Just usages are
> > different from each other. This patch tries to reduce event counter.
> > 
> > Now logic uses "only increment, no reset" counter and masks for each
> > checks. Softlimit chesk was done per 1000 evetns. So, the similar check
> > can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> > events. So, the similar check can be done by (!new_counter & 0x7f)
> > 
> > ALL event checks are done right after EVENT percpu counter is updated.
> > 
> > Changelog: 2010/02/12
> >  - fixed to use "inc" rather than "dec"
> >  - modified to be more unified style of counter handling.
> >  - taking care of account-move.
> > 
> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   86 ++++++++++++++++++++++++++------------------------------
> >  1 file changed, 41 insertions(+), 45 deletions(-)
> > 
> > Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> > @@ -63,8 +63,15 @@ static int really_do_swap_account __init
> >  #define do_swap_account		(0)
> >  #endif
> >  
> > -#define SOFTLIMIT_EVENTS_THRESH (1000)
> > -#define THRESHOLDS_EVENTS_THRESH (100)
> > +/*
> > + * Per memcg event counter is incremented at every pagein/pageout. This counter
> > + * is used for trigger some periodic events. This is straightforward and better
> > + * than using jiffies etc. to handle periodic memcg event.
> > + *
> > + * These values will be used as !((event) & ((1 <<(thresh)) - 1))
> > + */
> > +#define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
> > +#define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
> >  
> >  /*
> >   * Statistics for memory cgroup.
> > @@ -79,10 +86,7 @@ enum mem_cgroup_stat_index {
> >  	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
> >  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
> >  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> > -	MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> > -					used by soft limit implementation */
> > -	MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> > -					used by threshold implementation */
> > +	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
> >  
> >  	MEM_CGROUP_STAT_NSTATS,
> >  };
> > @@ -154,7 +158,6 @@ struct mem_cgroup_threshold_ary {
> >  	struct mem_cgroup_threshold entries[0];
> >  };
> >  
> > -static bool mem_cgroup_threshold_check(struct mem_cgroup *mem);
> >  static void mem_cgroup_threshold(struct mem_cgroup *mem);
> >  
> >  /*
> > @@ -392,19 +395,6 @@ mem_cgroup_remove_exceeded(struct mem_cg
> >  	spin_unlock(&mctz->lock);
> >  }
> >  
> > -static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
> > -{
> > -	bool ret = false;
> > -	s64 val;
> > -
> > -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > -	if (unlikely(val < 0)) {
> > -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> > -				SOFTLIMIT_EVENTS_THRESH);
> > -		ret = true;
> > -	}
> > -	return ret;
> > -}
> >  
> >  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> >  {
> > @@ -542,8 +532,7 @@ static void mem_cgroup_charge_statistics
> >  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
> >  	else
> >  		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> > -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > -	__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > +	__this_cpu_inc(mem->stat->count[MEM_CGROUP_EVENTS]);
> >  
> >  	preempt_enable();
> >  }
> > @@ -563,6 +552,29 @@ static unsigned long mem_cgroup_get_loca
> >  	return total;
> >  }
> >  
> > +static bool __memcg_event_check(struct mem_cgroup *mem, int event_mask_shift)
> > +{
> > +	s64 val;
> > +
> > +	val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> > +
> > +	return !(val & ((1 << event_mask_shift) - 1));
> > +}
> > +
> > +/*
> > + * Check events in order.
> > + *
> > + */
> > +static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
> > +{
> > +	/* threshold event is triggered in finer grain than soft limit */
> > +	if (unlikely(__memcg_event_check(mem, THRESHOLDS_EVENTS_THRESH))) {
> > +		mem_cgroup_threshold(mem);
> > +		if (unlikely(__memcg_event_check(mem, SOFTLIMIT_EVENTS_THRESH)))
> > +			mem_cgroup_update_tree(mem, page);
> > +	}
> > +}
> > +
> >  static struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
> >  {
> >  	return container_of(cgroup_subsys_state(cont,
> > @@ -1686,11 +1698,7 @@ static void __mem_cgroup_commit_charge(s
> >  	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> >  	 * if they exceeds softlimit.
> >  	 */
> > -	if (mem_cgroup_soft_limit_check(mem))
> > -		mem_cgroup_update_tree(mem, pc->page);
> > -	if (mem_cgroup_threshold_check(mem))
> > -		mem_cgroup_threshold(mem);
> > -
> > +	memcg_check_events(mem, pc->page);
> >  }
> >  
> >  /**
> > @@ -1760,6 +1768,11 @@ static int mem_cgroup_move_account(struc
> >  		ret = 0;
> >  	}
> >  	unlock_page_cgroup(pc);
> > +	/*
> > +	 * check events
> > +	 */
> > +	memcg_check_events(to, pc->page);
> > +	memcg_check_events(from, pc->page);
> >  	return ret;
> >  }
> >  
> Strictly speaking, "if (!ret)" would be needed(it's not a big deal, though).
> 
Hmm. ok. I'll check.

Thanks,
-kame


> Thanks,
> Daisuke Nishimura.
> 
> > @@ -2128,10 +2141,7 @@ __mem_cgroup_uncharge_common(struct page
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	unlock_page_cgroup(pc);
> >  
> > -	if (mem_cgroup_soft_limit_check(mem))
> > -		mem_cgroup_update_tree(mem, page);
> > -	if (mem_cgroup_threshold_check(mem))
> > -		mem_cgroup_threshold(mem);
> > +	memcg_check_events(mem, page);
> >  	/* at swapout, this memcg will be accessed to record to swap */
> >  	if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> >  		css_put(&mem->css);
> > @@ -3207,20 +3217,6 @@ static int mem_cgroup_swappiness_write(s
> >  	return 0;
> >  }
> >  
> > -static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
> > -{
> > -	bool ret = false;
> > -	s64 val;
> > -
> > -	val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > -	if (unlikely(val < 0)) {
> > -		this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> > -				THRESHOLDS_EVENTS_THRESH);
> > -		ret = true;
> > -	}
> > -	return ret;
> > -}
> > -
> >  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
> >  {
> >  	struct mem_cgroup_threshold_ary *t;
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg : share event counter rather than duplicate v2
  2010-02-12  9:09   ` [PATCH 2/2] memcg : share event counter rather than duplicate v2 KAMEZAWA Hiroyuki
  2010-02-12 11:48     ` Daisuke Nishimura
@ 2010-02-15 10:57     ` Kirill A. Shutemov
  2010-02-16  0:16       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2010-02-15 10:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Fri, Feb 12, 2010 at 11:09 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Memcg has 2 eventcountes which counts "the same" event. Just usages are
> different from each other. This patch tries to reduce event counter.
>
> Now logic uses "only increment, no reset" counter and masks for each
> checks. Softlimit chesk was done per 1000 evetns. So, the similar check
> can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> events. So, the similar check can be done by (!new_counter & 0x7f)
>
> ALL event checks are done right after EVENT percpu counter is updated.
>
> Changelog: 2010/02/12
>  - fixed to use "inc" rather than "dec"
>  - modified to be more unified style of counter handling.
>  - taking care of account-move.
>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   86 ++++++++++++++++++++++++++------------------------------
>  1 file changed, 41 insertions(+), 45 deletions(-)
>
> Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> @@ -63,8 +63,15 @@ static int really_do_swap_account __init
>  #define do_swap_account                (0)
>  #endif
>
> -#define SOFTLIMIT_EVENTS_THRESH (1000)
> -#define THRESHOLDS_EVENTS_THRESH (100)
> +/*
> + * Per memcg event counter is incremented at every pagein/pageout. This counter
> + * is used for trigger some periodic events. This is straightforward and better
> + * than using jiffies etc. to handle periodic memcg event.
> + *
> + * These values will be used as !((event) & ((1 <<(thresh)) - 1))
> + */
> +#define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
> +#define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
>
>  /*
>  * Statistics for memory cgroup.
> @@ -79,10 +86,7 @@ enum mem_cgroup_stat_index {
>        MEM_CGROUP_STAT_PGPGIN_COUNT,   /* # of pages paged in */
>        MEM_CGROUP_STAT_PGPGOUT_COUNT,  /* # of pages paged out */
>        MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> -       MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> -                                       used by soft limit implementation */
> -       MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> -                                       used by threshold implementation */
> +       MEM_CGROUP_EVENTS,      /* incremented at every  pagein/pageout */
>
>        MEM_CGROUP_STAT_NSTATS,
>  };
> @@ -154,7 +158,6 @@ struct mem_cgroup_threshold_ary {
>        struct mem_cgroup_threshold entries[0];
>  };
>
> -static bool mem_cgroup_threshold_check(struct mem_cgroup *mem);
>  static void mem_cgroup_threshold(struct mem_cgroup *mem);
>
>  /*
> @@ -392,19 +395,6 @@ mem_cgroup_remove_exceeded(struct mem_cg
>        spin_unlock(&mctz->lock);
>  }
>
> -static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
> -{
> -       bool ret = false;
> -       s64 val;
> -
> -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -       if (unlikely(val < 0)) {
> -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> -                               SOFTLIMIT_EVENTS_THRESH);
> -               ret = true;
> -       }
> -       return ret;
> -}
>
>  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
>  {
> @@ -542,8 +532,7 @@ static void mem_cgroup_charge_statistics
>                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
>        else
>                __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> -       __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> +       __this_cpu_inc(mem->stat->count[MEM_CGROUP_EVENTS]);
>
>        preempt_enable();
>  }
> @@ -563,6 +552,29 @@ static unsigned long mem_cgroup_get_loca
>        return total;
>  }
>
> +static bool __memcg_event_check(struct mem_cgroup *mem, int event_mask_shift)

inline?

> +{
> +       s64 val;
> +
> +       val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> +
> +       return !(val & ((1 << event_mask_shift) - 1));
> +}
> +
> +/*
> + * Check events in order.
> + *
> + */
> +static void memcg_check_events(struct mem_cgroup *mem, struct page *page)

Ditto.

> +{
> +       /* threshold event is triggered in finer grain than soft limit */
> +       if (unlikely(__memcg_event_check(mem, THRESHOLDS_EVENTS_THRESH))) {
> +               mem_cgroup_threshold(mem);
> +               if (unlikely(__memcg_event_check(mem, SOFTLIMIT_EVENTS_THRESH)))
> +                       mem_cgroup_update_tree(mem, page);
> +       }
> +}
> +
>  static struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
>  {
>        return container_of(cgroup_subsys_state(cont,
> @@ -1686,11 +1698,7 @@ static void __mem_cgroup_commit_charge(s
>         * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
>         * if they exceeds softlimit.
>         */
> -       if (mem_cgroup_soft_limit_check(mem))
> -               mem_cgroup_update_tree(mem, pc->page);
> -       if (mem_cgroup_threshold_check(mem))
> -               mem_cgroup_threshold(mem);
> -
> +       memcg_check_events(mem, pc->page);
>  }
>
>  /**
> @@ -1760,6 +1768,11 @@ static int mem_cgroup_move_account(struc
>                ret = 0;
>        }
>        unlock_page_cgroup(pc);
> +       /*
> +        * check events
> +        */
> +       memcg_check_events(to, pc->page);
> +       memcg_check_events(from, pc->page);
>        return ret;
>  }
>
> @@ -2128,10 +2141,7 @@ __mem_cgroup_uncharge_common(struct page
>        mz = page_cgroup_zoneinfo(pc);
>        unlock_page_cgroup(pc);
>
> -       if (mem_cgroup_soft_limit_check(mem))
> -               mem_cgroup_update_tree(mem, page);
> -       if (mem_cgroup_threshold_check(mem))
> -               mem_cgroup_threshold(mem);
> +       memcg_check_events(mem, page);
>        /* at swapout, this memcg will be accessed to record to swap */
>        if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
>                css_put(&mem->css);
> @@ -3207,20 +3217,6 @@ static int mem_cgroup_swappiness_write(s
>        return 0;
>  }
>
> -static bool mem_cgroup_threshold_check(struct mem_cgroup *mem)
> -{
> -       bool ret = false;
> -       s64 val;
> -
> -       val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> -       if (unlikely(val < 0)) {
> -               this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS],
> -                               THRESHOLDS_EVENTS_THRESH);
> -               ret = true;
> -       }
> -       return ret;
> -}
> -
>  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
>  {
>        struct mem_cgroup_threshold_ary *t;
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg : share event counter rather than duplicate v2
  2010-02-15 10:57     ` Kirill A. Shutemov
@ 2010-02-16  0:16       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-16  0:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org

On Mon, 15 Feb 2010 12:57:30 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 12, 2010 at 11:09 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Memcg has 2 eventcountes which counts "the same" event. Just usages are
> > different from each other. This patch tries to reduce event counter.
> >
> > Now logic uses "only increment, no reset" counter and masks for each
> > checks. Softlimit chesk was done per 1000 evetns. So, the similar check
> > can be done by !(new_counter & 0x3ff). Threshold check was done per 100
> > events. So, the similar check can be done by (!new_counter & 0x7f)
> >
> > ALL event checks are done right after EVENT percpu counter is updated.
> >
> > Changelog: 2010/02/12
> > A - fixed to use "inc" rather than "dec"
> > A - modified to be more unified style of counter handling.
> > A - taking care of account-move.
> >
> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A mm/memcontrol.c | A  86 ++++++++++++++++++++++++++------------------------------
> > A 1 file changed, 41 insertions(+), 45 deletions(-)
> >
> > Index: mmotm-2.6.33-Feb10/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb10.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb10/mm/memcontrol.c
> > @@ -63,8 +63,15 @@ static int really_do_swap_account __init
> > A #define do_swap_account A  A  A  A  A  A  A  A (0)
> > A #endif
> >
> > -#define SOFTLIMIT_EVENTS_THRESH (1000)
> > -#define THRESHOLDS_EVENTS_THRESH (100)
> > +/*
> > + * Per memcg event counter is incremented at every pagein/pageout. This counter
> > + * is used for trigger some periodic events. This is straightforward and better
> > + * than using jiffies etc. to handle periodic memcg event.
> > + *
> > + * These values will be used as !((event) & ((1 <<(thresh)) - 1))
> > + */
> > +#define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
> > +#define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
> >
> > A /*
> > A * Statistics for memory cgroup.
> > @@ -79,10 +86,7 @@ enum mem_cgroup_stat_index {
> > A  A  A  A MEM_CGROUP_STAT_PGPGIN_COUNT, A  /* # of pages paged in */
> > A  A  A  A MEM_CGROUP_STAT_PGPGOUT_COUNT, A /* # of pages paged out */
> > A  A  A  A MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> > - A  A  A  MEM_CGROUP_STAT_SOFTLIMIT, /* decrements on each page in/out.
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  used by soft limit implementation */
> > - A  A  A  MEM_CGROUP_STAT_THRESHOLDS, /* decrements on each page in/out.
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  used by threshold implementation */
> > + A  A  A  MEM_CGROUP_EVENTS, A  A  A /* incremented at every A pagein/pageout */
> >
> > A  A  A  A MEM_CGROUP_STAT_NSTATS,
> > A };
> > @@ -154,7 +158,6 @@ struct mem_cgroup_threshold_ary {
> > A  A  A  A struct mem_cgroup_threshold entries[0];
> > A };
> >
> > -static bool mem_cgroup_threshold_check(struct mem_cgroup *mem);
> > A static void mem_cgroup_threshold(struct mem_cgroup *mem);
> >
> > A /*
> > @@ -392,19 +395,6 @@ mem_cgroup_remove_exceeded(struct mem_cg
> > A  A  A  A spin_unlock(&mctz->lock);
> > A }
> >
> > -static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem)
> > -{
> > - A  A  A  bool ret = false;
> > - A  A  A  s64 val;
> > -
> > - A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > - A  A  A  if (unlikely(val < 0)) {
> > - A  A  A  A  A  A  A  this_cpu_write(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT],
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  SOFTLIMIT_EVENTS_THRESH);
> > - A  A  A  A  A  A  A  ret = true;
> > - A  A  A  }
> > - A  A  A  return ret;
> > -}
> >
> > A static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
> > A {
> > @@ -542,8 +532,7 @@ static void mem_cgroup_charge_statistics
> > A  A  A  A  A  A  A  A __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
> > A  A  A  A else
> > A  A  A  A  A  A  A  A __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
> > - A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_SOFTLIMIT]);
> > - A  A  A  __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_THRESHOLDS]);
> > + A  A  A  __this_cpu_inc(mem->stat->count[MEM_CGROUP_EVENTS]);
> >
> > A  A  A  A preempt_enable();
> > A }
> > @@ -563,6 +552,29 @@ static unsigned long mem_cgroup_get_loca
> > A  A  A  A return total;
> > A }
> >
> > +static bool __memcg_event_check(struct mem_cgroup *mem, int event_mask_shift)
> 
> inline?
> 
> > +{
> > + A  A  A  s64 val;
> > +
> > + A  A  A  val = this_cpu_read(mem->stat->count[MEM_CGROUP_EVENTS]);
> > +
> > + A  A  A  return !(val & ((1 << event_mask_shift) - 1));
> > +}
> > +
> > +/*
> > + * Check events in order.
> > + *
> > + */
> > +static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
> 
> Ditto.
> 
I'd like to depend on compiler.


Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] memcg : share event counter rather than duplicate v2
  2010-02-15  0:19       ` KAMEZAWA Hiroyuki
@ 2010-03-09 23:15         ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2010-03-09 23:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: nishimura, Daisuke Nishimura, linux-mm@kvack.org,
	Kirill A. Shutemov, balbir@linux.vnet.ibm.com

On Mon, 15 Feb 2010 09:19:06 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> > >  /**
> > > @@ -1760,6 +1768,11 @@ static int mem_cgroup_move_account(struc
> > >  		ret = 0;
> > >  	}
> > >  	unlock_page_cgroup(pc);
> > > +	/*
> > > +	 * check events
> > > +	 */
> > > +	memcg_check_events(to, pc->page);
> > > +	memcg_check_events(from, pc->page);
> > >  	return ret;
> > >  }
> > >  
> > Strictly speaking, "if (!ret)" would be needed(it's not a big deal, though).
> > 
> Hmm. ok. I'll check.

I'll assume that your checking resulted in happiness with the existing
patch ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-03-09 23:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-12  6:44 [PATCH 0/2] memcg patches around event counting...softlimit and thresholds KAMEZAWA Hiroyuki
2010-02-12  6:47 ` [PATCH 1/2] memcg : update softlimit and threshold at commit KAMEZAWA Hiroyuki
2010-02-12  7:33   ` Daisuke Nishimura
2010-02-12  7:42     ` KAMEZAWA Hiroyuki
2010-02-12  6:48 ` [PATCH 2/2] memcg: share event counter rather than duplicate KAMEZAWA Hiroyuki
2010-02-12  7:40   ` Daisuke Nishimura
2010-02-12  7:41     ` KAMEZAWA Hiroyuki
2010-02-12  7:46   ` Kirill A. Shutemov
2010-02-12  7:46     ` KAMEZAWA Hiroyuki
2010-02-12  8:07   ` Kirill A. Shutemov
2010-02-12  8:19     ` KAMEZAWA Hiroyuki
2010-02-12  8:49       ` Kirill A. Shutemov
2010-02-12  8:51         ` KAMEZAWA Hiroyuki
2010-02-12  9:05 ` [PATCH 0/2] memcg patches around event counting...softlimit and thresholds v2 KAMEZAWA Hiroyuki
2010-02-12  9:06   ` [PATCH 1/2] memcg: update threshold and softlimit at commit v2 KAMEZAWA Hiroyuki
2010-02-12  9:09   ` [PATCH 2/2] memcg : share event counter rather than duplicate v2 KAMEZAWA Hiroyuki
2010-02-12 11:48     ` Daisuke Nishimura
2010-02-15  0:19       ` KAMEZAWA Hiroyuki
2010-03-09 23:15         ` Andrew Morton
2010-02-15 10:57     ` Kirill A. Shutemov
2010-02-16  0:16       ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).