[patch 0/4] memcg: operate on page quantities internally

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch 0/4] memcg: operate on page quantities internally
@ 2011-02-09 11:01 Johannes Weiner
  2011-02-09 11:01 ` [patch 1/4] memcg: keep only one charge cancelling function Johannes Weiner
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Johannes Weiner @ 2011-02-09 11:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

Hi,

this patch set converts the memcg charge and uncharge paths to operate
on multiples of pages instead of bytes.  It already was a good idea
before, but with the merge of THP we made a real mess by specifying
huge pages alternatingly in bytes or in number of regular pages.

If I did not miss anything, this should leave only res_counter and
user-visible stuff in bytes.  The ABI probably won't change, so next
up is converting res_counter to operate on page quantities.

	Hannes

 include/linux/sched.h |    4 +-
 mm/memcontrol.c       |  157 ++++++++++++++++++++++++-------------------------
 2 files changed, 78 insertions(+), 83 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 1/4] memcg: keep only one charge cancelling function
  2011-02-09 11:01 [patch 0/4] memcg: operate on page quantities internally Johannes Weiner
@ 2011-02-09 11:01 ` Johannes Weiner
  2011-02-09 23:51   ` KAMEZAWA Hiroyuki
  2011-02-10 12:26   ` Johannes Weiner
  2011-02-09 11:01 ` [patch 2/4] memcg: convert per-cpu stock from bytes to page granularity Johannes Weiner
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Johannes Weiner @ 2011-02-09 11:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

We have two charge cancelling functions: one takes a page count, the
other a page size.  The second one just divides the parameter by
PAGE_SIZE and then calls the first one.  This is trivial, no need for
an extra function.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |   28 ++++++++++++----------------
 1 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 236f627..cabf421 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2004,22 +2004,18 @@ bypass:
  * This function is for that and do uncharge, put css's refcnt.
  * gotten by try_charge().
  */
-static void __mem_cgroup_cancel_charge(struct mem_cgroup *mem,
-							unsigned long count)
+static void mem_cgroup_cancel_charge(struct mem_cgroup *mem,
+				     unsigned int nr_pages)
 {
 	if (!mem_cgroup_is_root(mem)) {
-		res_counter_uncharge(&mem->res, PAGE_SIZE * count);
+		unsigned long bytes = nr_pages * PAGE_SIZE;
+
+		res_counter_uncharge(&mem->res, bytes);
 		if (do_swap_account)
-			res_counter_uncharge(&mem->memsw, PAGE_SIZE * count);
+			res_counter_uncharge(&mem->memsw, bytes);
 	}
 }
 
-static void mem_cgroup_cancel_charge(struct mem_cgroup *mem,
-				     int page_size)
-{
-	__mem_cgroup_cancel_charge(mem, page_size >> PAGE_SHIFT);
-}
-
 /*
  * A helper function to get mem_cgroup from ID. must be called under
  * rcu_read_lock(). The caller must check css_is_removed() or some if
@@ -2078,7 +2074,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 	lock_page_cgroup(pc);
 	if (unlikely(PageCgroupUsed(pc))) {
 		unlock_page_cgroup(pc);
-		mem_cgroup_cancel_charge(mem, page_size);
+		mem_cgroup_cancel_charge(mem, nr_pages);
 		return;
 	}
 	/*
@@ -2216,7 +2212,7 @@ static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
 	mem_cgroup_charge_statistics(from, PageCgroupCache(pc), -nr_pages);
 	if (uncharge)
 		/* This is not "cancel", but cancel_charge does all we need. */
-		mem_cgroup_cancel_charge(from, charge_size);
+		mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
 	pc->mem_cgroup = to;
@@ -2281,7 +2277,7 @@ static int mem_cgroup_move_parent(struct page *page,
 
 	ret = mem_cgroup_move_account(page, pc, child, parent, true, page_size);
 	if (ret)
-		mem_cgroup_cancel_charge(parent, page_size);
+		mem_cgroup_cancel_charge(parent, page_size >> PAGE_SHIFT);
 
 	if (page_size > PAGE_SIZE)
 		compound_unlock_irqrestore(page, flags);
@@ -2512,7 +2508,7 @@ void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
 		return;
 	if (!mem)
 		return;
-	mem_cgroup_cancel_charge(mem, PAGE_SIZE);
+	mem_cgroup_cancel_charge(mem, 1);
 }
 
 static void
@@ -4788,7 +4784,7 @@ static void __mem_cgroup_clear_mc(void)
 
 	/* we must uncharge all the leftover precharges from mc.to */
 	if (mc.precharge) {
-		__mem_cgroup_cancel_charge(mc.to, mc.precharge);
+		mem_cgroup_cancel_charge(mc.to, mc.precharge);
 		mc.precharge = 0;
 	}
 	/*
@@ -4796,7 +4792,7 @@ static void __mem_cgroup_clear_mc(void)
 	 * we must uncharge here.
 	 */
 	if (mc.moved_charge) {
-		__mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
+		mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
 		mc.moved_charge = 0;
 	}
 	/* we must fixup refcnts and charges */
-- 
1.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch 1/4] memcg: keep only one charge cancelling function
  2011-02-09 11:01 ` [patch 1/4] memcg: keep only one charge cancelling function Johannes Weiner
@ 2011-02-09 23:51   ` KAMEZAWA Hiroyuki
  2011-02-10 12:26   ` Johannes Weiner
  1 sibling, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-09 23:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Wed,  9 Feb 2011 12:01:50 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> We have two charge cancelling functions: one takes a page count, the
> other a page size.  The second one just divides the parameter by
> PAGE_SIZE and then calls the first one.  This is trivial, no need for
> an extra function.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 1/4] memcg: keep only one charge cancelling function
  2011-02-09 11:01 ` [patch 1/4] memcg: keep only one charge cancelling function Johannes Weiner
  2011-02-09 23:51   ` KAMEZAWA Hiroyuki
@ 2011-02-10 12:26   ` Johannes Weiner
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2011-02-10 12:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

Andrew, here is a fix for this patch that reverts to using the
underscored version of the cancel function, which already took a page
count.  Code developped in parallel will either use the underscore
version with the uncharged semantics or error out on the no longer
existing version that took a number of bytes.

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch] memcg: keep only one charge cancelling function fix

Keep the underscore-version of the charge cancelling function which
took a page count, rather than silently changing the semantics of the
non-underscore-version.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 804e9fc..e600b55 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2020,8 +2020,8 @@ bypass:
  * This function is for that and do uncharge, put css's refcnt.
  * gotten by try_charge().
  */
-static void mem_cgroup_cancel_charge(struct mem_cgroup *mem,
-				     unsigned int nr_pages)
+static void __mem_cgroup_cancel_charge(struct mem_cgroup *mem,
+				       unsigned int nr_pages)
 {
 	if (!mem_cgroup_is_root(mem)) {
 		unsigned long bytes = nr_pages * PAGE_SIZE;
@@ -2090,7 +2090,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 	lock_page_cgroup(pc);
 	if (unlikely(PageCgroupUsed(pc))) {
 		unlock_page_cgroup(pc);
-		mem_cgroup_cancel_charge(mem, nr_pages);
+		__mem_cgroup_cancel_charge(mem, nr_pages);
 		return;
 	}
 	/*
@@ -2228,7 +2228,7 @@ static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
 	mem_cgroup_charge_statistics(from, PageCgroupCache(pc), -nr_pages);
 	if (uncharge)
 		/* This is not "cancel", but cancel_charge does all we need. */
-		mem_cgroup_cancel_charge(from, nr_pages);
+		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
 	pc->mem_cgroup = to;
@@ -2293,7 +2293,7 @@ static int mem_cgroup_move_parent(struct page *page,
 
 	ret = mem_cgroup_move_account(page, pc, child, parent, true, page_size);
 	if (ret)
-		mem_cgroup_cancel_charge(parent, page_size >> PAGE_SHIFT);
+		__mem_cgroup_cancel_charge(parent, page_size >> PAGE_SHIFT);
 
 	if (page_size > PAGE_SIZE)
 		compound_unlock_irqrestore(page, flags);
@@ -2524,7 +2524,7 @@ void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
 		return;
 	if (!mem)
 		return;
-	mem_cgroup_cancel_charge(mem, 1);
+	__mem_cgroup_cancel_charge(mem, 1);
 }
 
 static void
@@ -4803,7 +4803,7 @@ static void __mem_cgroup_clear_mc(void)
 
 	/* we must uncharge all the leftover precharges from mc.to */
 	if (mc.precharge) {
-		mem_cgroup_cancel_charge(mc.to, mc.precharge);
+		__mem_cgroup_cancel_charge(mc.to, mc.precharge);
 		mc.precharge = 0;
 	}
 	/*
@@ -4811,7 +4811,7 @@ static void __mem_cgroup_clear_mc(void)
 	 * we must uncharge here.
 	 */
 	if (mc.moved_charge) {
-		mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
+		__mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
 		mc.moved_charge = 0;
 	}
 	/* we must fixup refcnts and charges */
-- 
1.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch 2/4] memcg: convert per-cpu stock from bytes to page granularity
  2011-02-09 11:01 [patch 0/4] memcg: operate on page quantities internally Johannes Weiner
  2011-02-09 11:01 ` [patch 1/4] memcg: keep only one charge cancelling function Johannes Weiner
@ 2011-02-09 11:01 ` Johannes Weiner
  2011-02-09 23:52   ` KAMEZAWA Hiroyuki
  2011-02-09 11:01 ` [patch 3/4] memcg: convert uncharge batching " Johannes Weiner
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Johannes Weiner @ 2011-02-09 11:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

We never keep subpage quantities in the per-cpu stock.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cabf421..179fd74 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1624,14 +1624,14 @@ EXPORT_SYMBOL(mem_cgroup_update_page_stat);
 #define CHARGE_SIZE	(32 * PAGE_SIZE)
 struct memcg_stock_pcp {
 	struct mem_cgroup *cached; /* this never be root cgroup */
-	int charge;
+	unsigned int nr_pages;
 	struct work_struct work;
 };
 static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
 static atomic_t memcg_drain_count;
 
 /*
- * Try to consume stocked charge on this cpu. If success, PAGE_SIZE is consumed
+ * Try to consume stocked charge on this cpu. If success, one page is consumed
  * from local stock and true is returned. If the stock is 0 or charges from a
  * cgroup which is not current target, returns false. This stock will be
  * refilled.
@@ -1642,8 +1642,8 @@ static bool consume_stock(struct mem_cgroup *mem)
 	bool ret = true;
 
 	stock = &get_cpu_var(memcg_stock);
-	if (mem == stock->cached && stock->charge)
-		stock->charge -= PAGE_SIZE;
+	if (mem == stock->cached && stock->nr_pages)
+		stock->nr_pages--;
 	else /* need to call res_counter_charge */
 		ret = false;
 	put_cpu_var(memcg_stock);
@@ -1657,13 +1657,15 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 {
 	struct mem_cgroup *old = stock->cached;
 
-	if (stock->charge) {
-		res_counter_uncharge(&old->res, stock->charge);
+	if (stock->nr_pages) {
+		unsigned long bytes = stock->nr_pages * PAGE_SIZE;
+
+		res_counter_uncharge(&old->res, bytes);
 		if (do_swap_account)
-			res_counter_uncharge(&old->memsw, stock->charge);
+			res_counter_uncharge(&old->memsw, bytes);
+		stock->nr_pages = 0;
 	}
 	stock->cached = NULL;
-	stock->charge = 0;
 }
 
 /*
@@ -1680,7 +1682,7 @@ static void drain_local_stock(struct work_struct *dummy)
  * Cache charges(val) which is from res_counter, to local per_cpu area.
  * This will be consumed by consume_stock() function, later.
  */
-static void refill_stock(struct mem_cgroup *mem, int val)
+static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
 
@@ -1688,7 +1690,7 @@ static void refill_stock(struct mem_cgroup *mem, int val)
 		drain_stock(stock);
 		stock->cached = mem;
 	}
-	stock->charge += val;
+	stock->nr_pages += nr_pages;
 	put_cpu_var(memcg_stock);
 }
 
@@ -1986,7 +1988,7 @@ again:
 	} while (ret != CHARGE_OK);
 
 	if (csize > page_size)
-		refill_stock(mem, csize - page_size);
+		refill_stock(mem, (csize - page_size) >> PAGE_SHIFT);
 	css_put(&mem->css);
 done:
 	*memcg = mem;
-- 
1.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch 2/4] memcg: convert per-cpu stock from bytes to page granularity
  2011-02-09 11:01 ` [patch 2/4] memcg: convert per-cpu stock from bytes to page granularity Johannes Weiner
@ 2011-02-09 23:52   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-09 23:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Wed,  9 Feb 2011 12:01:51 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> We never keep subpage quantities in the per-cpu stock.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 3/4] memcg: convert uncharge batching from bytes to page granularity
  2011-02-09 11:01 [patch 0/4] memcg: operate on page quantities internally Johannes Weiner
  2011-02-09 11:01 ` [patch 1/4] memcg: keep only one charge cancelling function Johannes Weiner
  2011-02-09 11:01 ` [patch 2/4] memcg: convert per-cpu stock from bytes to page granularity Johannes Weiner
@ 2011-02-09 11:01 ` Johannes Weiner
  2011-02-09 23:53   ` KAMEZAWA Hiroyuki
  2011-02-09 11:01 ` [patch 4/4] memcg: unify charge/uncharge quantities to units of pages Johannes Weiner
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Johannes Weiner @ 2011-02-09 11:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

We never uncharge subpage quantities.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/sched.h |    4 ++--
 mm/memcontrol.c       |   18 ++++++++++--------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7f7e97b..f896362 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1524,8 +1524,8 @@ struct task_struct {
 	struct memcg_batch_info {
 		int do_batch;	/* incremented when batch uncharge started */
 		struct mem_cgroup *memcg; /* target memcg of uncharge */
-		unsigned long bytes; 		/* uncharged usage */
-		unsigned long memsw_bytes; /* uncharged mem+swap usage */
+		unsigned long nr_pages;	/* uncharged usage */
+		unsigned long memsw_nr_pages; /* uncharged mem+swap usage */
 	} memcg_batch;
 #endif
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 179fd74..ab5cd3b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2553,9 +2553,9 @@ __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
 	if (batch->memcg != mem)
 		goto direct_uncharge;
 	/* remember freed charge and uncharge it later */
-	batch->bytes += PAGE_SIZE;
+	batch->nr_pages++;
 	if (uncharge_memsw)
-		batch->memsw_bytes += PAGE_SIZE;
+		batch->memsw_nr_pages++;
 	return;
 direct_uncharge:
 	res_counter_uncharge(&mem->res, page_size);
@@ -2682,8 +2682,8 @@ void mem_cgroup_uncharge_start(void)
 	/* We can do nest. */
 	if (current->memcg_batch.do_batch == 1) {
 		current->memcg_batch.memcg = NULL;
-		current->memcg_batch.bytes = 0;
-		current->memcg_batch.memsw_bytes = 0;
+		current->memcg_batch.nr_pages = 0;
+		current->memcg_batch.memsw_nr_pages = 0;
 	}
 }
 
@@ -2704,10 +2704,12 @@ void mem_cgroup_uncharge_end(void)
 	 * This "batch->memcg" is valid without any css_get/put etc...
 	 * bacause we hide charges behind us.
 	 */
-	if (batch->bytes)
-		res_counter_uncharge(&batch->memcg->res, batch->bytes);
-	if (batch->memsw_bytes)
-		res_counter_uncharge(&batch->memcg->memsw, batch->memsw_bytes);
+	if (batch->nr_pages)
+		res_counter_uncharge(&batch->memcg->res,
+				     batch->nr_pages * PAGE_SIZE);
+	if (batch->memsw_nr_pages)
+		res_counter_uncharge(&batch->memcg->memsw,
+				     batch->memsw_nr_pages * PAGE_SIZE);
 	memcg_oom_recover(batch->memcg);
 	/* forget this pointer (for sanity check) */
 	batch->memcg = NULL;
-- 
1.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch 3/4] memcg: convert uncharge batching from bytes to page granularity
  2011-02-09 11:01 ` [patch 3/4] memcg: convert uncharge batching " Johannes Weiner
@ 2011-02-09 23:53   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-09 23:53 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Wed,  9 Feb 2011 12:01:52 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> We never uncharge subpage quantities.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 4/4] memcg: unify charge/uncharge quantities to units of pages
  2011-02-09 11:01 [patch 0/4] memcg: operate on page quantities internally Johannes Weiner
                   ` (2 preceding siblings ...)
  2011-02-09 11:01 ` [patch 3/4] memcg: convert uncharge batching " Johannes Weiner
@ 2011-02-09 11:01 ` Johannes Weiner
  2011-02-09 23:54   ` KAMEZAWA Hiroyuki
  2011-02-10 12:36   ` Johannes Weiner
  2011-02-09 21:37 ` [patch 0/4] memcg: operate on page quantities internally Andrew Morton
  2011-02-09 23:50 ` KAMEZAWA Hiroyuki
  5 siblings, 2 replies; 15+ messages in thread
From: Johannes Weiner @ 2011-02-09 11:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

There is no clear pattern when we pass a page count and when we pass a
byte count that is a multiple of PAGE_SIZE.

We never charge or uncharge subpage quantities, so convert it all to
page counts.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |  117 +++++++++++++++++++++++++-----------------------------
 1 files changed, 54 insertions(+), 63 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab5cd3b..bc03f56 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1083,16 +1083,16 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
  * @mem: the memory cgroup
  *
  * Returns the maximum amount of memory @mem can be charged with, in
- * bytes.
+ * pages.
  */
-static unsigned long long mem_cgroup_margin(struct mem_cgroup *mem)
+static unsigned long mem_cgroup_margin(struct mem_cgroup *mem)
 {
 	unsigned long long margin;
 
 	margin = res_counter_margin(&mem->res);
 	if (do_swap_account)
 		margin = min(margin, res_counter_margin(&mem->memsw));
-	return margin;
+	return margin >> PAGE_SHIFT;
 }
 
 static unsigned int get_swappiness(struct mem_cgroup *memcg)
@@ -1621,7 +1621,7 @@ EXPORT_SYMBOL(mem_cgroup_update_page_stat);
  * size of first charge trial. "32" comes from vmscan.c's magic value.
  * TODO: maybe necessary to use big numbers in big irons.
  */
-#define CHARGE_SIZE	(32 * PAGE_SIZE)
+#define CHARGE_BATCH	32U
 struct memcg_stock_pcp {
 	struct mem_cgroup *cached; /* this never be root cgroup */
 	unsigned int nr_pages;
@@ -1797,8 +1797,9 @@ enum {
 };
 
 static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
-				int csize, bool oom_check)
+				  unsigned int nr_pages, bool oom_check)
 {
+	unsigned long csize = nr_pages * PAGE_SIZE;
 	struct mem_cgroup *mem_over_limit;
 	struct res_counter *fail_res;
 	unsigned long flags = 0;
@@ -1819,14 +1820,13 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 	} else
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
 	/*
-	 * csize can be either a huge page (HPAGE_SIZE), a batch of
-	 * regular pages (CHARGE_SIZE), or a single regular page
-	 * (PAGE_SIZE).
+	 * nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
+	 * of regular pages (CHARGE_BATCH), or a single regular page (1).
 	 *
 	 * Never reclaim on behalf of optional batching, retry with a
 	 * single page instead.
 	 */
-	if (csize == CHARGE_SIZE)
+	if (nr_pages == CHARGE_BATCH)
 		return CHARGE_RETRY;
 
 	if (!(gfp_mask & __GFP_WAIT))
@@ -1834,7 +1834,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 
 	ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
 					      gfp_mask, flags);
-	if (mem_cgroup_margin(mem_over_limit) >= csize)
+	if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
 		return CHARGE_RETRY;
 	/*
 	 * Even though the limit is exceeded at this point, reclaim
@@ -1845,7 +1845,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 	 * unlikely to succeed so close to the limit, and we fall back
 	 * to regular pages anyway in case of failure.
 	 */
-	if (csize == PAGE_SIZE && ret)
+	if (nr_pages == 1 && ret)
 		return CHARGE_RETRY;
 
 	/*
@@ -1872,12 +1872,12 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 static int __mem_cgroup_try_charge(struct mm_struct *mm,
 				   gfp_t gfp_mask,
 				   struct mem_cgroup **memcg, bool oom,
-				   int page_size)
+				   unsigned int nr_pages)
 {
+	unsigned int batch = max(CHARGE_BATCH, nr_pages);
 	int nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
 	struct mem_cgroup *mem = NULL;
 	int ret;
-	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
 
 	/*
 	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
@@ -1902,7 +1902,7 @@ again:
 		VM_BUG_ON(css_is_removed(&mem->css));
 		if (mem_cgroup_is_root(mem))
 			goto done;
-		if (page_size == PAGE_SIZE && consume_stock(mem))
+		if (nr_pages == 1 && consume_stock(mem))
 			goto done;
 		css_get(&mem->css);
 	} else {
@@ -1925,7 +1925,7 @@ again:
 			rcu_read_unlock();
 			goto done;
 		}
-		if (page_size == PAGE_SIZE && consume_stock(mem)) {
+		if (nr_pages == 1 && consume_stock(mem)) {
 			/*
 			 * It seems dagerous to access memcg without css_get().
 			 * But considering how consume_stok works, it's not
@@ -1960,13 +1960,12 @@ again:
 			nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
 		}
 
-		ret = __mem_cgroup_do_charge(mem, gfp_mask, csize, oom_check);
-
+		ret = __mem_cgroup_do_charge(mem, gfp_mask, batch, oom_check);
 		switch (ret) {
 		case CHARGE_OK:
 			break;
 		case CHARGE_RETRY: /* not in OOM situation but retry */
-			csize = page_size;
+			batch = nr_pages;
 			css_put(&mem->css);
 			mem = NULL;
 			goto again;
@@ -1987,8 +1986,8 @@ again:
 		}
 	} while (ret != CHARGE_OK);
 
-	if (csize > page_size)
-		refill_stock(mem, (csize - page_size) >> PAGE_SHIFT);
+	if (batch > nr_pages)
+		refill_stock(mem, batch - nr_pages);
 	css_put(&mem->css);
 done:
 	*memcg = mem;
@@ -2069,10 +2068,8 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 				       struct page *page,
 				       struct page_cgroup *pc,
 				       enum charge_type ctype,
-				       int page_size)
+				       unsigned int nr_pages)
 {
-	int nr_pages = page_size >> PAGE_SHIFT;
-
 	lock_page_cgroup(pc);
 	if (unlikely(PageCgroupUsed(pc))) {
 		unlock_page_cgroup(pc);
@@ -2165,11 +2162,11 @@ void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
  * @from: mem_cgroup which the page is moved from.
  * @to:	mem_cgroup which the page is moved to. @from != @to.
  * @uncharge: whether we should call uncharge and css_put against @from.
- * @charge_size: number of bytes to charge (regular or huge page)
+ * @nr_pages: number of regular pages (>1 for huge pages)
  *
  * The caller must confirm following.
  * - page is not on LRU (isolate_page() is useful.)
- * - compound_lock is held when charge_size > PAGE_SIZE
+ * - compound_lock is held when nr_pages > 1
  *
  * This function doesn't do "charge" nor css_get to new cgroup. It should be
  * done by a caller(__mem_cgroup_try_charge would be usefull). If @uncharge is
@@ -2178,9 +2175,8 @@ void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
  */
 static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
 				   struct mem_cgroup *from, struct mem_cgroup *to,
-				   bool uncharge, int charge_size)
+				   bool uncharge, unsigned int nr_pages)
 {
-	int nr_pages = charge_size >> PAGE_SHIFT;
 	unsigned long flags;
 	int ret;
 
@@ -2193,7 +2189,7 @@ static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
 	 * hold it.
 	 */
 	ret = -EBUSY;
-	if (charge_size > PAGE_SIZE && !PageTransHuge(page))
+	if (nr_pages > 1 && !PageTransHuge(page))
 		goto out;
 
 	lock_page_cgroup(pc);
@@ -2251,7 +2247,7 @@ static int mem_cgroup_move_parent(struct page *page,
 	struct cgroup *cg = child->css.cgroup;
 	struct cgroup *pcg = cg->parent;
 	struct mem_cgroup *parent;
-	int page_size = PAGE_SIZE;
+	unsigned int nr_pages;
 	unsigned long flags;
 	int ret;
 
@@ -2265,23 +2261,21 @@ static int mem_cgroup_move_parent(struct page *page,
 	if (isolate_lru_page(page))
 		goto put;
 
-	if (PageTransHuge(page))
-		page_size = HPAGE_SIZE;
+	nr_pages = hpage_nr_pages(page);
 
 	parent = mem_cgroup_from_cont(pcg);
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask,
-				&parent, false, page_size);
+	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, nr_pages);
 	if (ret || !parent)
 		goto put_back;
 
-	if (page_size > PAGE_SIZE)
+	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	ret = mem_cgroup_move_account(page, pc, child, parent, true, page_size);
+	ret = mem_cgroup_move_account(page, pc, child, parent, true, nr_pages);
 	if (ret)
-		mem_cgroup_cancel_charge(parent, page_size >> PAGE_SHIFT);
+		mem_cgroup_cancel_charge(parent, nr_pages);
 
-	if (page_size > PAGE_SIZE)
+	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
 put_back:
 	putback_lru_page(page);
@@ -2301,13 +2295,13 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask, enum charge_type ctype)
 {
 	struct mem_cgroup *mem = NULL;
-	int page_size = PAGE_SIZE;
+	unsigned int nr_pages = 1;
 	struct page_cgroup *pc;
 	bool oom = true;
 	int ret;
 
 	if (PageTransHuge(page)) {
-		page_size <<= compound_order(page);
+		nr_pages <<= compound_order(page);
 		VM_BUG_ON(!PageTransHuge(page));
 		/*
 		 * Never OOM-kill a process for a huge page.  The
@@ -2319,11 +2313,11 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 	pc = lookup_page_cgroup(page);
 	BUG_ON(!pc); /* XXX: remove this and move pc lookup into commit */
 
-	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, page_size);
+	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, nr_pages);
 	if (ret || !mem)
 		return ret;
 
-	__mem_cgroup_commit_charge(mem, page, pc, ctype, page_size);
+	__mem_cgroup_commit_charge(mem, page, pc, ctype, nr_pages);
 	return 0;
 }
 
@@ -2439,13 +2433,13 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
 	if (!mem)
 		goto charge_cur_mm;
 	*ptr = mem;
-	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true, PAGE_SIZE);
+	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true, 1);
 	css_put(&mem->css);
 	return ret;
 charge_cur_mm:
 	if (unlikely(!mm))
 		mm = &init_mm;
-	return __mem_cgroup_try_charge(mm, mask, ptr, true, PAGE_SIZE);
+	return __mem_cgroup_try_charge(mm, mask, ptr, true, 1);
 }
 
 static void
@@ -2461,7 +2455,7 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
 	cgroup_exclude_rmdir(&ptr->css);
 	pc = lookup_page_cgroup(page);
 	mem_cgroup_lru_del_before_commit_swapcache(page);
-	__mem_cgroup_commit_charge(ptr, page, pc, ctype, PAGE_SIZE);
+	__mem_cgroup_commit_charge(ptr, page, pc, ctype, 1);
 	mem_cgroup_lru_add_after_commit_swapcache(page);
 	/*
 	 * Now swap is on-memory. This means this page may be
@@ -2515,10 +2509,11 @@ void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
 
 static void
 __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
-	      int page_size)
+	      unsigned int nr_pages)
 {
 	struct memcg_batch_info *batch = NULL;
 	bool uncharge_memsw = true;
+
 	/* If swapout, usage of swap doesn't decrease */
 	if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
 		uncharge_memsw = false;
@@ -2542,7 +2537,7 @@ __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
 	if (!batch->do_batch || test_thread_flag(TIF_MEMDIE))
 		goto direct_uncharge;
 
-	if (page_size != PAGE_SIZE)
+	if (nr_pages > 1)
 		goto direct_uncharge;
 
 	/*
@@ -2558,9 +2553,9 @@ __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
 		batch->memsw_nr_pages++;
 	return;
 direct_uncharge:
-	res_counter_uncharge(&mem->res, page_size);
+	res_counter_uncharge(&mem->res, nr_pages * PAGE_SIZE);
 	if (uncharge_memsw)
-		res_counter_uncharge(&mem->memsw, page_size);
+		res_counter_uncharge(&mem->memsw, nr_pages * PAGE_SIZE);
 	if (unlikely(batch->memcg != mem))
 		memcg_oom_recover(mem);
 	return;
@@ -2572,10 +2567,9 @@ direct_uncharge:
 static struct mem_cgroup *
 __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 {
-	int count;
-	struct page_cgroup *pc;
 	struct mem_cgroup *mem = NULL;
-	int page_size = PAGE_SIZE;
+	unsigned int nr_pages = 1;
+	struct page_cgroup *pc;
 
 	if (mem_cgroup_disabled())
 		return NULL;
@@ -2584,11 +2578,9 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		return NULL;
 
 	if (PageTransHuge(page)) {
-		page_size <<= compound_order(page);
+		nr_pages <<= compound_order(page);
 		VM_BUG_ON(!PageTransHuge(page));
 	}
-
-	count = page_size >> PAGE_SHIFT;
 	/*
 	 * Check if our page_cgroup is valid
 	 */
@@ -2621,7 +2613,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		break;
 	}
 
-	mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -count);
+	mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -nr_pages);
 
 	ClearPageCgroupUsed(pc);
 	/*
@@ -2642,7 +2634,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		mem_cgroup_get(mem);
 	}
 	if (!mem_cgroup_is_root(mem))
-		__do_uncharge(mem, ctype, page_size);
+		__do_uncharge(mem, ctype, nr_pages);
 
 	return mem;
 
@@ -2834,8 +2826,8 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 int mem_cgroup_prepare_migration(struct page *page,
 	struct page *newpage, struct mem_cgroup **ptr, gfp_t gfp_mask)
 {
-	struct page_cgroup *pc;
 	struct mem_cgroup *mem = NULL;
+	struct page_cgroup *pc;
 	enum charge_type ctype;
 	int ret = 0;
 
@@ -2891,7 +2883,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 		return 0;
 
 	*ptr = mem;
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, ptr, false, PAGE_SIZE);
+	ret = __mem_cgroup_try_charge(NULL, gfp_mask, ptr, false, 1);
 	css_put(&mem->css);/* drop extra refcnt */
 	if (ret || *ptr == NULL) {
 		if (PageAnon(page)) {
@@ -2918,7 +2910,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 		ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
 	else
 		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
-	__mem_cgroup_commit_charge(mem, page, pc, ctype, PAGE_SIZE);
+	__mem_cgroup_commit_charge(mem, page, pc, ctype, 1);
 	return ret;
 }
 
@@ -4572,8 +4564,7 @@ one_by_one:
 			batch_count = PRECHARGE_COUNT_AT_ONCE;
 			cond_resched();
 		}
-		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false,
-					      PAGE_SIZE);
+		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false, 1);
 		if (ret || !mem)
 			/* mem_cgroup_clear_mc() will do uncharge later */
 			return -ENOMEM;
@@ -4918,8 +4909,8 @@ retry:
 			if (isolate_lru_page(page))
 				goto put;
 			pc = lookup_page_cgroup(page);
-			if (!mem_cgroup_move_account(page, pc,
-					mc.from, mc.to, false, PAGE_SIZE)) {
+			if (!mem_cgroup_move_account(page, pc, mc.from,
+						     mc.to, false, 1)) {
 				mc.precharge--;
 				/* we uncharge from mc.from later. */
 				mc.moved_charge++;
-- 
1.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch 4/4] memcg: unify charge/uncharge quantities to units of pages
  2011-02-09 11:01 ` [patch 4/4] memcg: unify charge/uncharge quantities to units of pages Johannes Weiner
@ 2011-02-09 23:54   ` KAMEZAWA Hiroyuki
  2011-02-10 12:36   ` Johannes Weiner
  1 sibling, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-09 23:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Wed,  9 Feb 2011 12:01:53 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> There is no clear pattern when we pass a page count and when we pass a
> byte count that is a multiple of PAGE_SIZE.
> 
> We never charge or uncharge subpage quantities, so convert it all to
> page counts.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


> ---
>  mm/memcontrol.c |  117 +++++++++++++++++++++++++-----------------------------
>  1 files changed, 54 insertions(+), 63 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ab5cd3b..bc03f56 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1083,16 +1083,16 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
>   * @mem: the memory cgroup
>   *
>   * Returns the maximum amount of memory @mem can be charged with, in
> - * bytes.
> + * pages.
>   */
> -static unsigned long long mem_cgroup_margin(struct mem_cgroup *mem)
> +static unsigned long mem_cgroup_margin(struct mem_cgroup *mem)
>  {
>  	unsigned long long margin;
>  
>  	margin = res_counter_margin(&mem->res);
>  	if (do_swap_account)
>  		margin = min(margin, res_counter_margin(&mem->memsw));
> -	return margin;
> +	return margin >> PAGE_SHIFT;
>  }
>  
>  static unsigned int get_swappiness(struct mem_cgroup *memcg)
> @@ -1621,7 +1621,7 @@ EXPORT_SYMBOL(mem_cgroup_update_page_stat);
>   * size of first charge trial. "32" comes from vmscan.c's magic value.
>   * TODO: maybe necessary to use big numbers in big irons.
>   */
> -#define CHARGE_SIZE	(32 * PAGE_SIZE)
> +#define CHARGE_BATCH	32U
>  struct memcg_stock_pcp {
>  	struct mem_cgroup *cached; /* this never be root cgroup */
>  	unsigned int nr_pages;
> @@ -1797,8 +1797,9 @@ enum {
>  };
>  
>  static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
> -				int csize, bool oom_check)
> +				  unsigned int nr_pages, bool oom_check)
>  {
> +	unsigned long csize = nr_pages * PAGE_SIZE;
>  	struct mem_cgroup *mem_over_limit;
>  	struct res_counter *fail_res;
>  	unsigned long flags = 0;
> @@ -1819,14 +1820,13 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
>  	} else
>  		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
>  	/*
> -	 * csize can be either a huge page (HPAGE_SIZE), a batch of
> -	 * regular pages (CHARGE_SIZE), or a single regular page
> -	 * (PAGE_SIZE).
> +	 * nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
> +	 * of regular pages (CHARGE_BATCH), or a single regular page (1).
>  	 *
>  	 * Never reclaim on behalf of optional batching, retry with a
>  	 * single page instead.
>  	 */
> -	if (csize == CHARGE_SIZE)
> +	if (nr_pages == CHARGE_BATCH)
>  		return CHARGE_RETRY;
>  
>  	if (!(gfp_mask & __GFP_WAIT))
> @@ -1834,7 +1834,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
>  
>  	ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
>  					      gfp_mask, flags);
> -	if (mem_cgroup_margin(mem_over_limit) >= csize)
> +	if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
>  		return CHARGE_RETRY;
>  	/*
>  	 * Even though the limit is exceeded at this point, reclaim
> @@ -1845,7 +1845,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
>  	 * unlikely to succeed so close to the limit, and we fall back
>  	 * to regular pages anyway in case of failure.
>  	 */
> -	if (csize == PAGE_SIZE && ret)
> +	if (nr_pages == 1 && ret)
>  		return CHARGE_RETRY;
>  
>  	/*
> @@ -1872,12 +1872,12 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
>  static int __mem_cgroup_try_charge(struct mm_struct *mm,
>  				   gfp_t gfp_mask,
>  				   struct mem_cgroup **memcg, bool oom,
> -				   int page_size)
> +				   unsigned int nr_pages)
>  {
> +	unsigned int batch = max(CHARGE_BATCH, nr_pages);
>  	int nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
>  	struct mem_cgroup *mem = NULL;
>  	int ret;
> -	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
>  
>  	/*
>  	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
> @@ -1902,7 +1902,7 @@ again:
>  		VM_BUG_ON(css_is_removed(&mem->css));
>  		if (mem_cgroup_is_root(mem))
>  			goto done;
> -		if (page_size == PAGE_SIZE && consume_stock(mem))
> +		if (nr_pages == 1 && consume_stock(mem))
>  			goto done;
>  		css_get(&mem->css);
>  	} else {
> @@ -1925,7 +1925,7 @@ again:
>  			rcu_read_unlock();
>  			goto done;
>  		}
> -		if (page_size == PAGE_SIZE && consume_stock(mem)) {
> +		if (nr_pages == 1 && consume_stock(mem)) {
>  			/*
>  			 * It seems dagerous to access memcg without css_get().
>  			 * But considering how consume_stok works, it's not
> @@ -1960,13 +1960,12 @@ again:
>  			nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
>  		}
>  
> -		ret = __mem_cgroup_do_charge(mem, gfp_mask, csize, oom_check);
> -
> +		ret = __mem_cgroup_do_charge(mem, gfp_mask, batch, oom_check);
>  		switch (ret) {
>  		case CHARGE_OK:
>  			break;
>  		case CHARGE_RETRY: /* not in OOM situation but retry */
> -			csize = page_size;
> +			batch = nr_pages;
>  			css_put(&mem->css);
>  			mem = NULL;
>  			goto again;
> @@ -1987,8 +1986,8 @@ again:
>  		}
>  	} while (ret != CHARGE_OK);
>  
> -	if (csize > page_size)
> -		refill_stock(mem, (csize - page_size) >> PAGE_SHIFT);
> +	if (batch > nr_pages)
> +		refill_stock(mem, batch - nr_pages);
>  	css_put(&mem->css);
>  done:
>  	*memcg = mem;
> @@ -2069,10 +2068,8 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  				       struct page *page,
>  				       struct page_cgroup *pc,
>  				       enum charge_type ctype,
> -				       int page_size)
> +				       unsigned int nr_pages)
>  {
> -	int nr_pages = page_size >> PAGE_SHIFT;
> -
>  	lock_page_cgroup(pc);
>  	if (unlikely(PageCgroupUsed(pc))) {
>  		unlock_page_cgroup(pc);
> @@ -2165,11 +2162,11 @@ void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
>   * @from: mem_cgroup which the page is moved from.
>   * @to:	mem_cgroup which the page is moved to. @from != @to.
>   * @uncharge: whether we should call uncharge and css_put against @from.
> - * @charge_size: number of bytes to charge (regular or huge page)
> + * @nr_pages: number of regular pages (>1 for huge pages)
>   *
>   * The caller must confirm following.
>   * - page is not on LRU (isolate_page() is useful.)
> - * - compound_lock is held when charge_size > PAGE_SIZE
> + * - compound_lock is held when nr_pages > 1
>   *
>   * This function doesn't do "charge" nor css_get to new cgroup. It should be
>   * done by a caller(__mem_cgroup_try_charge would be usefull). If @uncharge is
> @@ -2178,9 +2175,8 @@ void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
>   */
>  static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
>  				   struct mem_cgroup *from, struct mem_cgroup *to,
> -				   bool uncharge, int charge_size)
> +				   bool uncharge, unsigned int nr_pages)
>  {
> -	int nr_pages = charge_size >> PAGE_SHIFT;
>  	unsigned long flags;
>  	int ret;
>  
> @@ -2193,7 +2189,7 @@ static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
>  	 * hold it.
>  	 */
>  	ret = -EBUSY;
> -	if (charge_size > PAGE_SIZE && !PageTransHuge(page))
> +	if (nr_pages > 1 && !PageTransHuge(page))
>  		goto out;
>  
>  	lock_page_cgroup(pc);
> @@ -2251,7 +2247,7 @@ static int mem_cgroup_move_parent(struct page *page,
>  	struct cgroup *cg = child->css.cgroup;
>  	struct cgroup *pcg = cg->parent;
>  	struct mem_cgroup *parent;
> -	int page_size = PAGE_SIZE;
> +	unsigned int nr_pages;
>  	unsigned long flags;
>  	int ret;
>  
> @@ -2265,23 +2261,21 @@ static int mem_cgroup_move_parent(struct page *page,
>  	if (isolate_lru_page(page))
>  		goto put;
>  
> -	if (PageTransHuge(page))
> -		page_size = HPAGE_SIZE;
> +	nr_pages = hpage_nr_pages(page);
>  
>  	parent = mem_cgroup_from_cont(pcg);
> -	ret = __mem_cgroup_try_charge(NULL, gfp_mask,
> -				&parent, false, page_size);
> +	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, nr_pages);
>  	if (ret || !parent)
>  		goto put_back;
>  
> -	if (page_size > PAGE_SIZE)
> +	if (nr_pages > 1)
>  		flags = compound_lock_irqsave(page);
>  
> -	ret = mem_cgroup_move_account(page, pc, child, parent, true, page_size);
> +	ret = mem_cgroup_move_account(page, pc, child, parent, true, nr_pages);
>  	if (ret)
> -		mem_cgroup_cancel_charge(parent, page_size >> PAGE_SHIFT);
> +		mem_cgroup_cancel_charge(parent, nr_pages);
>  
> -	if (page_size > PAGE_SIZE)
> +	if (nr_pages > 1)
>  		compound_unlock_irqrestore(page, flags);
>  put_back:
>  	putback_lru_page(page);
> @@ -2301,13 +2295,13 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
>  				gfp_t gfp_mask, enum charge_type ctype)
>  {
>  	struct mem_cgroup *mem = NULL;
> -	int page_size = PAGE_SIZE;
> +	unsigned int nr_pages = 1;
>  	struct page_cgroup *pc;
>  	bool oom = true;
>  	int ret;
>  
>  	if (PageTransHuge(page)) {
> -		page_size <<= compound_order(page);
> +		nr_pages <<= compound_order(page);
>  		VM_BUG_ON(!PageTransHuge(page));
>  		/*
>  		 * Never OOM-kill a process for a huge page.  The
> @@ -2319,11 +2313,11 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
>  	pc = lookup_page_cgroup(page);
>  	BUG_ON(!pc); /* XXX: remove this and move pc lookup into commit */
>  
> -	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, page_size);
> +	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, nr_pages);
>  	if (ret || !mem)
>  		return ret;
>  
> -	__mem_cgroup_commit_charge(mem, page, pc, ctype, page_size);
> +	__mem_cgroup_commit_charge(mem, page, pc, ctype, nr_pages);
>  	return 0;
>  }
>  
> @@ -2439,13 +2433,13 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
>  	if (!mem)
>  		goto charge_cur_mm;
>  	*ptr = mem;
> -	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true, PAGE_SIZE);
> +	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true, 1);
>  	css_put(&mem->css);
>  	return ret;
>  charge_cur_mm:
>  	if (unlikely(!mm))
>  		mm = &init_mm;
> -	return __mem_cgroup_try_charge(mm, mask, ptr, true, PAGE_SIZE);
> +	return __mem_cgroup_try_charge(mm, mask, ptr, true, 1);
>  }
>  
>  static void
> @@ -2461,7 +2455,7 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
>  	cgroup_exclude_rmdir(&ptr->css);
>  	pc = lookup_page_cgroup(page);
>  	mem_cgroup_lru_del_before_commit_swapcache(page);
> -	__mem_cgroup_commit_charge(ptr, page, pc, ctype, PAGE_SIZE);
> +	__mem_cgroup_commit_charge(ptr, page, pc, ctype, 1);
>  	mem_cgroup_lru_add_after_commit_swapcache(page);
>  	/*
>  	 * Now swap is on-memory. This means this page may be
> @@ -2515,10 +2509,11 @@ void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
>  
>  static void
>  __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
> -	      int page_size)
> +	      unsigned int nr_pages)
>  {
>  	struct memcg_batch_info *batch = NULL;
>  	bool uncharge_memsw = true;
> +
>  	/* If swapout, usage of swap doesn't decrease */
>  	if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
>  		uncharge_memsw = false;
> @@ -2542,7 +2537,7 @@ __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
>  	if (!batch->do_batch || test_thread_flag(TIF_MEMDIE))
>  		goto direct_uncharge;
>  
> -	if (page_size != PAGE_SIZE)
> +	if (nr_pages > 1)
>  		goto direct_uncharge;
>  
>  	/*
> @@ -2558,9 +2553,9 @@ __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
>  		batch->memsw_nr_pages++;
>  	return;
>  direct_uncharge:
> -	res_counter_uncharge(&mem->res, page_size);
> +	res_counter_uncharge(&mem->res, nr_pages * PAGE_SIZE);
>  	if (uncharge_memsw)
> -		res_counter_uncharge(&mem->memsw, page_size);
> +		res_counter_uncharge(&mem->memsw, nr_pages * PAGE_SIZE);
>  	if (unlikely(batch->memcg != mem))
>  		memcg_oom_recover(mem);
>  	return;
> @@ -2572,10 +2567,9 @@ direct_uncharge:
>  static struct mem_cgroup *
>  __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  {
> -	int count;
> -	struct page_cgroup *pc;
>  	struct mem_cgroup *mem = NULL;
> -	int page_size = PAGE_SIZE;
> +	unsigned int nr_pages = 1;
> +	struct page_cgroup *pc;
>  
>  	if (mem_cgroup_disabled())
>  		return NULL;
> @@ -2584,11 +2578,9 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  		return NULL;
>  
>  	if (PageTransHuge(page)) {
> -		page_size <<= compound_order(page);
> +		nr_pages <<= compound_order(page);
>  		VM_BUG_ON(!PageTransHuge(page));
>  	}
> -
> -	count = page_size >> PAGE_SHIFT;
>  	/*
>  	 * Check if our page_cgroup is valid
>  	 */
> @@ -2621,7 +2613,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  		break;
>  	}
>  
> -	mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -count);
> +	mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -nr_pages);
>  
>  	ClearPageCgroupUsed(pc);
>  	/*
> @@ -2642,7 +2634,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  		mem_cgroup_get(mem);
>  	}
>  	if (!mem_cgroup_is_root(mem))
> -		__do_uncharge(mem, ctype, page_size);
> +		__do_uncharge(mem, ctype, nr_pages);
>  
>  	return mem;
>  
> @@ -2834,8 +2826,8 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  int mem_cgroup_prepare_migration(struct page *page,
>  	struct page *newpage, struct mem_cgroup **ptr, gfp_t gfp_mask)
>  {
> -	struct page_cgroup *pc;
>  	struct mem_cgroup *mem = NULL;
> +	struct page_cgroup *pc;
>  	enum charge_type ctype;
>  	int ret = 0;
>  
> @@ -2891,7 +2883,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>  		return 0;
>  
>  	*ptr = mem;
> -	ret = __mem_cgroup_try_charge(NULL, gfp_mask, ptr, false, PAGE_SIZE);
> +	ret = __mem_cgroup_try_charge(NULL, gfp_mask, ptr, false, 1);
>  	css_put(&mem->css);/* drop extra refcnt */
>  	if (ret || *ptr == NULL) {
>  		if (PageAnon(page)) {
> @@ -2918,7 +2910,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>  		ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
>  	else
>  		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> -	__mem_cgroup_commit_charge(mem, page, pc, ctype, PAGE_SIZE);
> +	__mem_cgroup_commit_charge(mem, page, pc, ctype, 1);
>  	return ret;
>  }
>  
> @@ -4572,8 +4564,7 @@ one_by_one:
>  			batch_count = PRECHARGE_COUNT_AT_ONCE;
>  			cond_resched();
>  		}
> -		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false,
> -					      PAGE_SIZE);
> +		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false, 1);
>  		if (ret || !mem)
>  			/* mem_cgroup_clear_mc() will do uncharge later */
>  			return -ENOMEM;
> @@ -4918,8 +4909,8 @@ retry:
>  			if (isolate_lru_page(page))
>  				goto put;
>  			pc = lookup_page_cgroup(page);
> -			if (!mem_cgroup_move_account(page, pc,
> -					mc.from, mc.to, false, PAGE_SIZE)) {
> +			if (!mem_cgroup_move_account(page, pc, mc.from,
> +						     mc.to, false, 1)) {
>  				mc.precharge--;
>  				/* we uncharge from mc.from later. */
>  				mc.moved_charge++;
> -- 
> 1.7.4
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 4/4] memcg: unify charge/uncharge quantities to units of pages
  2011-02-09 11:01 ` [patch 4/4] memcg: unify charge/uncharge quantities to units of pages Johannes Weiner
  2011-02-09 23:54   ` KAMEZAWA Hiroyuki
@ 2011-02-10 12:36   ` Johannes Weiner
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2011-02-10 12:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

The update to 1/4 made one hunk of this one no longer apply, so to
save you the hassle, here is a complete replacement.

What I changed to visibly break new old-API users:

1. mem_cgroup_margin: sufficiently new to not have new users
developped that assume the return value to be in unit of bytes, so I
left it alone

2. __mem_cgroup_do_charge: dropped the underscores

3. __mem_cgroup_try_charge: moved @nr_pages parameter so that using
the old function signature would complain about passing integers for
pointers and vice versa

4. __mem_cgroup_commit_charge: same as 3.

5. mem_cgroup_move_account: same as 4.

6. __do_uncharge: renamed to mem_cgroup_do_uncharge

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch] memcg: unify charge/uncharge quantities to units of pages

There is no clear pattern when we pass a page count and when we pass a
byte count that is a multiple of PAGE_SIZE.

We never charge or uncharge subpage quantities, so convert it all to
page counts.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |  135 ++++++++++++++++++++++++++----------------------------
 1 files changed, 65 insertions(+), 70 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63c65ab..78a79ea 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1092,16 +1092,16 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
  * @mem: the memory cgroup
  *
  * Returns the maximum amount of memory @mem can be charged with, in
- * bytes.
+ * pages.
  */
-static unsigned long long mem_cgroup_margin(struct mem_cgroup *mem)
+static unsigned long mem_cgroup_margin(struct mem_cgroup *mem)
 {
 	unsigned long long margin;
 
 	margin = res_counter_margin(&mem->res);
 	if (do_swap_account)
 		margin = min(margin, res_counter_margin(&mem->memsw));
-	return margin;
+	return margin >> PAGE_SHIFT;
 }
 
 static unsigned int get_swappiness(struct mem_cgroup *memcg)
@@ -1637,7 +1637,7 @@ EXPORT_SYMBOL(mem_cgroup_update_page_stat);
  * size of first charge trial. "32" comes from vmscan.c's magic value.
  * TODO: maybe necessary to use big numbers in big irons.
  */
-#define CHARGE_SIZE	(32 * PAGE_SIZE)
+#define CHARGE_BATCH	32U
 struct memcg_stock_pcp {
 	struct mem_cgroup *cached; /* this never be root cgroup */
 	unsigned int nr_pages;
@@ -1812,9 +1812,10 @@ enum {
 	CHARGE_OOM_DIE,		/* the current is killed because of OOM */
 };
 
-static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
-				int csize, bool oom_check)
+static int mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
+				unsigned int nr_pages, bool oom_check)
 {
+	unsigned long csize = nr_pages * PAGE_SIZE;
 	struct mem_cgroup *mem_over_limit;
 	struct res_counter *fail_res;
 	unsigned long flags = 0;
@@ -1835,14 +1836,13 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 	} else
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
 	/*
-	 * csize can be either a huge page (HPAGE_SIZE), a batch of
-	 * regular pages (CHARGE_SIZE), or a single regular page
-	 * (PAGE_SIZE).
+	 * nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
+	 * of regular pages (CHARGE_BATCH), or a single regular page (1).
 	 *
 	 * Never reclaim on behalf of optional batching, retry with a
 	 * single page instead.
 	 */
-	if (csize == CHARGE_SIZE)
+	if (nr_pages == CHARGE_BATCH)
 		return CHARGE_RETRY;
 
 	if (!(gfp_mask & __GFP_WAIT))
@@ -1850,7 +1850,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 
 	ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
 					      gfp_mask, flags);
-	if (mem_cgroup_margin(mem_over_limit) >= csize)
+	if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
 		return CHARGE_RETRY;
 	/*
 	 * Even though the limit is exceeded at this point, reclaim
@@ -1861,7 +1861,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 	 * unlikely to succeed so close to the limit, and we fall back
 	 * to regular pages anyway in case of failure.
 	 */
-	if (csize == PAGE_SIZE && ret)
+	if (nr_pages == 1 && ret)
 		return CHARGE_RETRY;
 
 	/*
@@ -1887,13 +1887,14 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
  */
 static int __mem_cgroup_try_charge(struct mm_struct *mm,
 				   gfp_t gfp_mask,
-				   struct mem_cgroup **memcg, bool oom,
-				   int page_size)
+				   unsigned int nr_pages,
+				   struct mem_cgroup **memcg,
+				   bool oom)
 {
+	unsigned int batch = max(CHARGE_BATCH, nr_pages);
 	int nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
 	struct mem_cgroup *mem = NULL;
 	int ret;
-	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
 
 	/*
 	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
@@ -1918,7 +1919,7 @@ again:
 		VM_BUG_ON(css_is_removed(&mem->css));
 		if (mem_cgroup_is_root(mem))
 			goto done;
-		if (page_size == PAGE_SIZE && consume_stock(mem))
+		if (nr_pages == 1 && consume_stock(mem))
 			goto done;
 		css_get(&mem->css);
 	} else {
@@ -1941,7 +1942,7 @@ again:
 			rcu_read_unlock();
 			goto done;
 		}
-		if (page_size == PAGE_SIZE && consume_stock(mem)) {
+		if (nr_pages == 1 && consume_stock(mem)) {
 			/*
 			 * It seems dagerous to access memcg without css_get().
 			 * But considering how consume_stok works, it's not
@@ -1976,13 +1977,12 @@ again:
 			nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
 		}
 
-		ret = __mem_cgroup_do_charge(mem, gfp_mask, csize, oom_check);
-
+		ret = mem_cgroup_do_charge(mem, gfp_mask, batch, oom_check);
 		switch (ret) {
 		case CHARGE_OK:
 			break;
 		case CHARGE_RETRY: /* not in OOM situation but retry */
-			csize = page_size;
+			batch = nr_pages;
 			css_put(&mem->css);
 			mem = NULL;
 			goto again;
@@ -2003,8 +2003,8 @@ again:
 		}
 	} while (ret != CHARGE_OK);
 
-	if (csize > page_size)
-		refill_stock(mem, (csize - page_size) >> PAGE_SHIFT);
+	if (batch > nr_pages)
+		refill_stock(mem, batch - nr_pages);
 	css_put(&mem->css);
 done:
 	*memcg = mem;
@@ -2083,12 +2083,10 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 
 static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 				       struct page *page,
+				       unsigned int nr_pages,
 				       struct page_cgroup *pc,
-				       enum charge_type ctype,
-				       int page_size)
+				       enum charge_type ctype)
 {
-	int nr_pages = page_size >> PAGE_SHIFT;
-
 	lock_page_cgroup(pc);
 	if (unlikely(PageCgroupUsed(pc))) {
 		unlock_page_cgroup(pc);
@@ -2177,26 +2175,28 @@ void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
 /**
  * mem_cgroup_move_account - move account of the page
  * @page: the page
+ * @nr_pages: number of regular pages (>1 for huge pages)
  * @pc:	page_cgroup of the page.
  * @from: mem_cgroup which the page is moved from.
  * @to:	mem_cgroup which the page is moved to. @from != @to.
  * @uncharge: whether we should call uncharge and css_put against @from.
- * @charge_size: number of bytes to charge (regular or huge page)
  *
  * The caller must confirm following.
  * - page is not on LRU (isolate_page() is useful.)
- * - compound_lock is held when charge_size > PAGE_SIZE
+ * - compound_lock is held when nr_pages > 1
  *
  * This function doesn't do "charge" nor css_get to new cgroup. It should be
  * done by a caller(__mem_cgroup_try_charge would be usefull). If @uncharge is
  * true, this function does "uncharge" from old cgroup, but it doesn't if
  * @uncharge is false, so a caller should do "uncharge".
  */
-static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
-				   struct mem_cgroup *from, struct mem_cgroup *to,
-				   bool uncharge, int charge_size)
+static int mem_cgroup_move_account(struct page *page,
+				   unsigned int nr_pages,
+				   struct page_cgroup *pc,
+				   struct mem_cgroup *from,
+				   struct mem_cgroup *to,
+				   bool uncharge)
 {
-	int nr_pages = charge_size >> PAGE_SHIFT;
 	unsigned long flags;
 	int ret;
 
@@ -2209,7 +2209,7 @@ static int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
 	 * hold it.
 	 */
 	ret = -EBUSY;
-	if (charge_size > PAGE_SIZE && !PageTransHuge(page))
+	if (nr_pages > 1 && !PageTransHuge(page))
 		goto out;
 
 	lock_page_cgroup(pc);
@@ -2267,7 +2267,7 @@ static int mem_cgroup_move_parent(struct page *page,
 	struct cgroup *cg = child->css.cgroup;
 	struct cgroup *pcg = cg->parent;
 	struct mem_cgroup *parent;
-	int page_size = PAGE_SIZE;
+	unsigned int nr_pages;
 	unsigned long flags;
 	int ret;
 
@@ -2281,23 +2281,21 @@ static int mem_cgroup_move_parent(struct page *page,
 	if (isolate_lru_page(page))
 		goto put;
 
-	if (PageTransHuge(page))
-		page_size = HPAGE_SIZE;
+	nr_pages = hpage_nr_pages(page);
 
 	parent = mem_cgroup_from_cont(pcg);
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask,
-				&parent, false, page_size);
+	ret = __mem_cgroup_try_charge(NULL, gfp_mask, nr_pages, &parent, false);
 	if (ret || !parent)
 		goto put_back;
 
-	if (page_size > PAGE_SIZE)
+	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
-	ret = mem_cgroup_move_account(page, pc, child, parent, true, page_size);
+	ret = mem_cgroup_move_account(page, nr_pages, pc, child, parent, true);
 	if (ret)
-		__mem_cgroup_cancel_charge(parent, page_size >> PAGE_SHIFT);
+		__mem_cgroup_cancel_charge(parent, nr_pages);
 
-	if (page_size > PAGE_SIZE)
+	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
 put_back:
 	putback_lru_page(page);
@@ -2317,13 +2315,13 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask, enum charge_type ctype)
 {
 	struct mem_cgroup *mem = NULL;
-	int page_size = PAGE_SIZE;
+	unsigned int nr_pages = 1;
 	struct page_cgroup *pc;
 	bool oom = true;
 	int ret;
 
 	if (PageTransHuge(page)) {
-		page_size <<= compound_order(page);
+		nr_pages <<= compound_order(page);
 		VM_BUG_ON(!PageTransHuge(page));
 		/*
 		 * Never OOM-kill a process for a huge page.  The
@@ -2335,11 +2333,11 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 	pc = lookup_page_cgroup(page);
 	BUG_ON(!pc); /* XXX: remove this and move pc lookup into commit */
 
-	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, page_size);
+	ret = __mem_cgroup_try_charge(mm, gfp_mask, nr_pages, &mem, oom);
 	if (ret || !mem)
 		return ret;
 
-	__mem_cgroup_commit_charge(mem, page, pc, ctype, page_size);
+	__mem_cgroup_commit_charge(mem, page, nr_pages, pc, ctype);
 	return 0;
 }
 
@@ -2455,13 +2453,13 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
 	if (!mem)
 		goto charge_cur_mm;
 	*ptr = mem;
-	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true, PAGE_SIZE);
+	ret = __mem_cgroup_try_charge(NULL, mask, 1, ptr, true);
 	css_put(&mem->css);
 	return ret;
 charge_cur_mm:
 	if (unlikely(!mm))
 		mm = &init_mm;
-	return __mem_cgroup_try_charge(mm, mask, ptr, true, PAGE_SIZE);
+	return __mem_cgroup_try_charge(mm, mask, 1, ptr, true);
 }
 
 static void
@@ -2477,7 +2475,7 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
 	cgroup_exclude_rmdir(&ptr->css);
 	pc = lookup_page_cgroup(page);
 	mem_cgroup_lru_del_before_commit_swapcache(page);
-	__mem_cgroup_commit_charge(ptr, page, pc, ctype, PAGE_SIZE);
+	__mem_cgroup_commit_charge(ptr, page, 1, pc, ctype);
 	mem_cgroup_lru_add_after_commit_swapcache(page);
 	/*
 	 * Now swap is on-memory. This means this page may be
@@ -2529,12 +2527,13 @@ void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
 	__mem_cgroup_cancel_charge(mem, 1);
 }
 
-static void
-__do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
-	      int page_size)
+static void mem_cgroup_do_uncharge(struct mem_cgroup *mem,
+				   unsigned int nr_pages,
+				   const enum charge_type ctype)
 {
 	struct memcg_batch_info *batch = NULL;
 	bool uncharge_memsw = true;
+
 	/* If swapout, usage of swap doesn't decrease */
 	if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
 		uncharge_memsw = false;
@@ -2558,7 +2557,7 @@ __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
 	if (!batch->do_batch || test_thread_flag(TIF_MEMDIE))
 		goto direct_uncharge;
 
-	if (page_size != PAGE_SIZE)
+	if (nr_pages > 1)
 		goto direct_uncharge;
 
 	/*
@@ -2574,9 +2573,9 @@ __do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype,
 		batch->memsw_nr_pages++;
 	return;
 direct_uncharge:
-	res_counter_uncharge(&mem->res, page_size);
+	res_counter_uncharge(&mem->res, nr_pages * PAGE_SIZE);
 	if (uncharge_memsw)
-		res_counter_uncharge(&mem->memsw, page_size);
+		res_counter_uncharge(&mem->memsw, nr_pages * PAGE_SIZE);
 	if (unlikely(batch->memcg != mem))
 		memcg_oom_recover(mem);
 	return;
@@ -2588,10 +2587,9 @@ direct_uncharge:
 static struct mem_cgroup *
 __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 {
-	int count;
-	struct page_cgroup *pc;
 	struct mem_cgroup *mem = NULL;
-	int page_size = PAGE_SIZE;
+	unsigned int nr_pages = 1;
+	struct page_cgroup *pc;
 
 	if (mem_cgroup_disabled())
 		return NULL;
@@ -2600,11 +2598,9 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		return NULL;
 
 	if (PageTransHuge(page)) {
-		page_size <<= compound_order(page);
+		nr_pages <<= compound_order(page);
 		VM_BUG_ON(!PageTransHuge(page));
 	}
-
-	count = page_size >> PAGE_SHIFT;
 	/*
 	 * Check if our page_cgroup is valid
 	 */
@@ -2637,7 +2633,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		break;
 	}
 
-	mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -count);
+	mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -nr_pages);
 
 	ClearPageCgroupUsed(pc);
 	/*
@@ -2658,7 +2654,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		mem_cgroup_get(mem);
 	}
 	if (!mem_cgroup_is_root(mem))
-		__do_uncharge(mem, ctype, page_size);
+		mem_cgroup_do_uncharge(mem, nr_pages, ctype);
 
 	return mem;
 
@@ -2850,8 +2846,8 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 int mem_cgroup_prepare_migration(struct page *page,
 	struct page *newpage, struct mem_cgroup **ptr, gfp_t gfp_mask)
 {
-	struct page_cgroup *pc;
 	struct mem_cgroup *mem = NULL;
+	struct page_cgroup *pc;
 	enum charge_type ctype;
 	int ret = 0;
 
@@ -2907,7 +2903,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 		return 0;
 
 	*ptr = mem;
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, ptr, false, PAGE_SIZE);
+	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, ptr, false);
 	css_put(&mem->css);/* drop extra refcnt */
 	if (ret || *ptr == NULL) {
 		if (PageAnon(page)) {
@@ -2934,7 +2930,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 		ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
 	else
 		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
-	__mem_cgroup_commit_charge(mem, page, pc, ctype, PAGE_SIZE);
+	__mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
 	return ret;
 }
 
@@ -4591,8 +4587,7 @@ one_by_one:
 			batch_count = PRECHARGE_COUNT_AT_ONCE;
 			cond_resched();
 		}
-		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false,
-					      PAGE_SIZE);
+		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, 1, &mem, false);
 		if (ret || !mem)
 			/* mem_cgroup_clear_mc() will do uncharge later */
 			return -ENOMEM;
@@ -4937,8 +4932,8 @@ retry:
 			if (isolate_lru_page(page))
 				goto put;
 			pc = lookup_page_cgroup(page);
-			if (!mem_cgroup_move_account(page, pc,
-					mc.from, mc.to, false, PAGE_SIZE)) {
+			if (!mem_cgroup_move_account(page, 1, pc,
+						     mc.from, mc.to, false)) {
 				mc.precharge--;
 				/* we uncharge from mc.from later. */
 				mc.moved_charge++;
-- 
1.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch 0/4] memcg: operate on page quantities internally
  2011-02-09 11:01 [patch 0/4] memcg: operate on page quantities internally Johannes Weiner
                   ` (3 preceding siblings ...)
  2011-02-09 11:01 ` [patch 4/4] memcg: unify charge/uncharge quantities to units of pages Johannes Weiner
@ 2011-02-09 21:37 ` Andrew Morton
  2011-02-10 12:40   ` Johannes Weiner
  2011-02-09 23:50 ` KAMEZAWA Hiroyuki
  5 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2011-02-09 21:37 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Wed,  9 Feb 2011 12:01:49 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Hi,
> 
> this patch set converts the memcg charge and uncharge paths to operate
> on multiples of pages instead of bytes.  It already was a good idea
> before, but with the merge of THP we made a real mess by specifying
> huge pages alternatingly in bytes or in number of regular pages.
> 
> If I did not miss anything, this should leave only res_counter and
> user-visible stuff in bytes.  The ABI probably won't change, so next
> up is converting res_counter to operate on page quantities.
> 

I worry that there will be unconverted code and we'll end up adding
bugs.

A way to minimise the risk is to force compilation errors and warnings:
rename fields and functions, reorder function arguments.  Did your
patches do this as much as they could have?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/4] memcg: operate on page quantities internally
  2011-02-09 21:37 ` [patch 0/4] memcg: operate on page quantities internally Andrew Morton
@ 2011-02-10 12:40   ` Johannes Weiner
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2011-02-10 12:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Wed, Feb 09, 2011 at 01:37:57PM -0800, Andrew Morton wrote:
> On Wed,  9 Feb 2011 12:01:49 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Hi,
> > 
> > this patch set converts the memcg charge and uncharge paths to operate
> > on multiples of pages instead of bytes.  It already was a good idea
> > before, but with the merge of THP we made a real mess by specifying
> > huge pages alternatingly in bytes or in number of regular pages.
> > 
> > If I did not miss anything, this should leave only res_counter and
> > user-visible stuff in bytes.  The ABI probably won't change, so next
> > up is converting res_counter to operate on page quantities.
> > 
> 
> I worry that there will be unconverted code and we'll end up adding
> bugs.
> 
> A way to minimise the risk is to force compilation errors and warnings:
> rename fields and functions, reorder function arguments.  Did your
> patches do this as much as they could have?

I sent you fixes/replacements for 1/4 and 4/4. 2/4 and 3/4 adjusted
the names of changed structure members already.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/4] memcg: operate on page quantities internally
  2011-02-09 11:01 [patch 0/4] memcg: operate on page quantities internally Johannes Weiner
                   ` (4 preceding siblings ...)
  2011-02-09 21:37 ` [patch 0/4] memcg: operate on page quantities internally Andrew Morton
@ 2011-02-09 23:50 ` KAMEZAWA Hiroyuki
  2011-02-10 12:42   ` Johannes Weiner
  5 siblings, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-09 23:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Wed,  9 Feb 2011 12:01:49 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Hi,
> 
> this patch set converts the memcg charge and uncharge paths to operate
> on multiples of pages instead of bytes.  It already was a good idea
> before, but with the merge of THP we made a real mess by specifying
> huge pages alternatingly in bytes or in number of regular pages.
> 
> If I did not miss anything, this should leave only res_counter and
> user-visible stuff in bytes.  The ABI probably won't change, so next
> up is converting res_counter to operate on page quantities.
> 

Hmm, I think this should be done but think this should be postphoned, too.
Because, IIUC, some guys will try to discuss charging against kernel objects
in the next mm-summit. IMHO, it will be done against PAGE not against
Object even if we do kernel object accouting. So this patch is okay for me.
But I think it's better to go ahead after we confirm the way we go.
How do you think ?

Anyway, I welcome this patch.

Thanks,
-Kame



> 	Hannes
> 
>  include/linux/sched.h |    4 +-
>  mm/memcontrol.c       |  157 ++++++++++++++++++++++++-------------------------
>  2 files changed, 78 insertions(+), 83 deletions(-)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/4] memcg: operate on page quantities internally
  2011-02-09 23:50 ` KAMEZAWA Hiroyuki
@ 2011-02-10 12:42   ` Johannes Weiner
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2011-02-10 12:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

On Thu, Feb 10, 2011 at 08:50:34AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed,  9 Feb 2011 12:01:49 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> > If I did not miss anything, this should leave only res_counter and
> > user-visible stuff in bytes.  The ABI probably won't change, so next
> > up is converting res_counter to operate on page quantities.
> 
> Hmm, I think this should be done but think this should be postphoned, too.
> Because, IIUC, some guys will try to discuss charging against kernel objects
> in the next mm-summit. IMHO, it will be done against PAGE not against
> Object even if we do kernel object accouting. So this patch is okay for me.
> But I think it's better to go ahead after we confirm the way we go.
> How do you think ?

That makes sense, let's leave res_counter alone until we have hashed
this out.

> Anyway, I welcome this patch.

Thanks for reviewing,

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-10 12:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-09 11:01 [patch 0/4] memcg: operate on page quantities internally Johannes Weiner
2011-02-09 11:01 ` [patch 1/4] memcg: keep only one charge cancelling function Johannes Weiner
2011-02-09 23:51   ` KAMEZAWA Hiroyuki
2011-02-10 12:26   ` Johannes Weiner
2011-02-09 11:01 ` [patch 2/4] memcg: convert per-cpu stock from bytes to page granularity Johannes Weiner
2011-02-09 23:52   ` KAMEZAWA Hiroyuki
2011-02-09 11:01 ` [patch 3/4] memcg: convert uncharge batching " Johannes Weiner
2011-02-09 23:53   ` KAMEZAWA Hiroyuki
2011-02-09 11:01 ` [patch 4/4] memcg: unify charge/uncharge quantities to units of pages Johannes Weiner
2011-02-09 23:54   ` KAMEZAWA Hiroyuki
2011-02-10 12:36   ` Johannes Weiner
2011-02-09 21:37 ` [patch 0/4] memcg: operate on page quantities internally Andrew Morton
2011-02-10 12:40   ` Johannes Weiner
2011-02-09 23:50 ` KAMEZAWA Hiroyuki
2011-02-10 12:42   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).