linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup.
  2012-02-06 10:06 [RFC] [PATCH 0/6 v3] memcg: page cgroup diet KAMEZAWA Hiroyuki
@ 2012-02-06 10:09 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-06 10:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hannes@cmpxchg.org, Michal Hocko, Ying Han, Hugh Dickins,
	akpm@linux-foundation.org

>From 94b17cbc95e068a0a841c84fb0345f48a2a27d24 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 2 Feb 2012 11:49:59 +0900
Subject: [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup.

PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
pc->mem_cgroup and page statistics accounting per memcg.
This lock helps to avoid the race but the race is very rare because moving
tasks between cgroup is not a usual job.
So, it seems using 1bit per page is too costly.

This patch changes this lock as per-memcg spinlock and removes PCG_MOVE_LOCK.

If smaller lock is required, we'll be able to add some hashes but
I'd like to start from this.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   19 -------------------
 mm/memcontrol.c             |   34 ++++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 1060292..7a3af74 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -7,7 +7,6 @@ enum {
 	PCG_USED, /* this object is in use. */
 	PCG_MIGRATION, /* under page migration */
 	/* flags for mem_cgroup and file and I/O status */
-	PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
 	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
 	__NR_PCG_FLAGS,
 };
@@ -89,24 +88,6 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 	bit_spin_unlock(PCG_LOCK, &pc->flags);
 }
 
-static inline void move_lock_page_cgroup(struct page_cgroup *pc,
-	unsigned long *flags)
-{
-	/*
-	 * We know updates to pc->flags of page cache's stats are from both of
-	 * usual context or IRQ context. Disable IRQ to avoid deadlock.
-	 */
-	local_irq_save(*flags);
-	bit_spin_lock(PCG_MOVE_LOCK, &pc->flags);
-}
-
-static inline void move_unlock_page_cgroup(struct page_cgroup *pc,
-	unsigned long *flags)
-{
-	bit_spin_unlock(PCG_MOVE_LOCK, &pc->flags);
-	local_irq_restore(*flags);
-}
-
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct page_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4ba0d76..083154d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -280,6 +280,8 @@ struct mem_cgroup {
 	 * set > 0 if pages under this cgroup are moving to other cgroup.
 	 */
 	atomic_t	moving_account;
+	/* taken only while moving_account > 0 */
+	spinlock_t	move_lock;
 	/*
 	 * percpu counter.
 	 */
@@ -1338,6 +1340,34 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg)
 	return false;
 }
 
+/*
+ * Take this lock when
+ * - a code tries to modify page's memcg while it's USED.
+ * - a code tries to modify page state accounting in a memcg.
+ * see mem_cgroup_stealed(), too.
+ */
+static void move_lock_page_cgroup(struct page_cgroup *pc, unsigned long *flags)
+{
+	struct mem_cgroup *memcg;
+
+again:
+	memcg = pc->mem_cgroup;
+	spin_lock_irqsave(&memcg->move_lock, *flags);
+	if (unlikely(pc->mem_cgroup != memcg)) {
+		spin_unlock_irqrestore(&memcg->move_lock, *flags);
+		goto again;
+	}
+}
+
+static void move_unlock_page_cgroup(struct page_cgroup *pc,
+				unsigned long *flags)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = pc->mem_cgroup;
+	spin_unlock_irqrestore(&memcg->move_lock, *flags);
+}
+
 /**
  * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode.
  * @memcg: The memory cgroup that went over limit
@@ -2435,8 +2465,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MOVE_LOCK) |\
-			(1 << PCG_MIGRATION))
+#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -4923,6 +4952,7 @@ mem_cgroup_create(struct cgroup *cont)
 	atomic_set(&memcg->refcnt, 1);
 	memcg->move_charge_at_immigrate = 0;
 	mutex_init(&memcg->thresholds_lock);
+	spin_lock_init(&memcg->move_lock);
 	return &memcg->css;
 free_out:
 	__mem_cgroup_free(memcg);
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 0/6] page cgroup diet v5
@ 2012-02-17  9:24 KAMEZAWA Hiroyuki
  2012-02-17  9:25 ` [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17  9:24 UTC (permalink / raw)
  To: linux-mm@kvack.org
  Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, Michal Hocko, Hugh Dickins, Greg Thelen,
	Ying Han


This patch set is for removing 2 flags PCG_FILE_MAPPED and PCG_MOVE_LOCK on
page_cgroup->flags. After this, page_cgroup has only 3bits of flags.
And, this set introduces a new method to update page status accounting per memcg.
With it, we don't have to add new flags onto page_cgroup if 'struct page' has
information. This will be good for avoiding a new flag for page_cgroup.

Fixed pointed out parts.
 - added more comments
 - fixed texts
 - removed redundant arguments.

Passed some tests on 3.3.0-rc3-next-20120216.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
@ 2012-02-17  9:25 ` KAMEZAWA Hiroyuki
  2012-02-20  8:54   ` Johannes Weiner
  2012-02-17  9:26 ` [PATCH 2/6] memcg: simplify move_account() check KAMEZAWA Hiroyuki
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17  9:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org, Michal Hocko,
	Hugh Dickins, Greg Thelen, Ying Han

>From 7075e575cc6ee383f8fd161ba44c44a60c295d5f Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 2 Feb 2012 12:05:41 +0900
Subject: [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)

>From the log, I guess EXPORT was for preparing dirty accounting.
But _now_, we don't need to export this. Remove this for now.

Reviewed-by: Greg Thelen <gthelen@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab315ab..4c2b759 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1897,7 +1897,6 @@ out:
 		move_unlock_page_cgroup(pc, &flags);
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(mem_cgroup_update_page_stat);
 
 /*
  * size of first charge trial. "32" comes from vmscan.c's magic value.
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] memcg: simplify move_account() check.
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
  2012-02-17  9:25 ` [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
@ 2012-02-17  9:26 ` KAMEZAWA Hiroyuki
  2012-02-20  8:55   ` Johannes Weiner
  2012-02-17  9:26 ` [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17  9:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org, Michal Hocko,
	Hugh Dickins, Greg Thelen, Ying Han

>From 3b6620772d7fd7b2126d5253eafb6afaf4ed6e34 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 2 Feb 2012 10:02:39 +0900
Subject: [PATCH 2/6] memcg: simplify move_account() check.

In memcg, for avoiding take-lock-irq-off at accessing page_cgroup,
a logic, flag + rcu_read_lock(), is used. This works as following

     CPU-A                     CPU-B
                             rcu_read_lock()
    set flag
                             if(flag is set)
                                   take heavy lock
                             do job.
    synchronize_rcu()        rcu_read_unlock()
    take heavy lock.

In recent discussion, it's argued that using per-cpu value for this
flag just complicates the code because 'set flag' is very rare.

This patch changes 'flag' implementation from percpu to atomic_t.
This will be much simpler.

Changelog v5.
 - removed redundant ().
 - updated patch description.

Changelog: v4
 - fixed many typos.
 - fixed return value to be bool
 - add comments.
Changelog: v3
 - this is a new patch since v3.

Acked-by: Greg Thelen <gthelen@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 memcontrol.c |   65 ++++++++++++++++++++++-------------------------------------
 1 file changed, 25 insertions(+), 40 deletions(-)
---
 mm/memcontrol.c |   70 +++++++++++++++++++++++-------------------------------
 1 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4c2b759..9c94389 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -89,7 +89,6 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
 	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
 	MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
-	MEM_CGROUP_ON_MOVE,	/* someone is moving account between groups */
 	MEM_CGROUP_STAT_NSTATS,
 };
 
@@ -278,6 +277,10 @@ struct mem_cgroup {
 	 */
 	unsigned long 	move_charge_at_immigrate;
 	/*
+	 * set > 0 if pages under this cgroup are moving to other cgroup.
+	 */
+	atomic_t	moving_account;
+	/*
 	 * percpu counter.
 	 */
 	struct mem_cgroup_stat_cpu *stat;
@@ -1253,36 +1256,37 @@ int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 
 	return memcg->swappiness;
 }
+/*
+ * memcg->moving_account is used for checking possibility that some thread is
+ * calling move_account(). When a thread on CPU-A starts moving pages under
+ * a memcg, other threads should check memcg->moving_account under
+ * rcu_read_lock(), like this:
+ *
+ *         CPU-A                                    CPU-B
+ *                                              rcu_read_lock()
+ *         memcg->moving_account+1              if (memcg->mocing_account)
+ *                                                   take heavy locks.
+ *         synchronize_rcu()                    update something.
+ *                                              rcu_read_unlock()
+ *         start move here.
+ */
 
 static void mem_cgroup_start_move(struct mem_cgroup *memcg)
 {
-	int cpu;
-
-	get_online_cpus();
-	spin_lock(&memcg->pcp_counter_lock);
-	for_each_online_cpu(cpu)
-		per_cpu(memcg->stat->count[MEM_CGROUP_ON_MOVE], cpu) += 1;
-	memcg->nocpu_base.count[MEM_CGROUP_ON_MOVE] += 1;
-	spin_unlock(&memcg->pcp_counter_lock);
-	put_online_cpus();
-
+	atomic_inc(&memcg->moving_account);
 	synchronize_rcu();
 }
 
 static void mem_cgroup_end_move(struct mem_cgroup *memcg)
 {
-	int cpu;
-
-	if (!memcg)
-		return;
-	get_online_cpus();
-	spin_lock(&memcg->pcp_counter_lock);
-	for_each_online_cpu(cpu)
-		per_cpu(memcg->stat->count[MEM_CGROUP_ON_MOVE], cpu) -= 1;
-	memcg->nocpu_base.count[MEM_CGROUP_ON_MOVE] -= 1;
-	spin_unlock(&memcg->pcp_counter_lock);
-	put_online_cpus();
+	/*
+	 * Now, mem_cgroup_clear_mc() may call this function with NULL.
+	 * We check NULL in callee rather than caller.
+	 */
+	if (memcg)
+		atomic_dec(&memcg->moving_account);
 }
+
 /*
  * 2 routines for checking "mem" is under move_account() or not.
  *
@@ -1298,7 +1302,7 @@ static void mem_cgroup_end_move(struct mem_cgroup *memcg)
 static bool mem_cgroup_stealed(struct mem_cgroup *memcg)
 {
 	VM_BUG_ON(!rcu_read_lock_held());
-	return this_cpu_read(memcg->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
+	return atomic_read(&memcg->moving_account) > 0;
 }
 
 static bool mem_cgroup_under_move(struct mem_cgroup *memcg)
@@ -1849,8 +1853,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
  * by flags.
  *
  * Considering "move", this is an only case we see a race. To make the race
- * small, we check MEM_CGROUP_ON_MOVE percpu value and detect there are
- * possibility of race condition. If there is, we take a lock.
+ * small, we check mm->moving_account and detect there are possibility of race
+ * If there is, we take a lock.
  */
 
 void mem_cgroup_update_page_stat(struct page *page,
@@ -2067,17 +2071,6 @@ static void mem_cgroup_drain_pcp_counter(struct mem_cgroup *memcg, int cpu)
 		per_cpu(memcg->stat->events[i], cpu) = 0;
 		memcg->nocpu_base.events[i] += x;
 	}
-	/* need to clear ON_MOVE value, works as a kind of lock. */
-	per_cpu(memcg->stat->count[MEM_CGROUP_ON_MOVE], cpu) = 0;
-	spin_unlock(&memcg->pcp_counter_lock);
-}
-
-static void synchronize_mem_cgroup_on_move(struct mem_cgroup *memcg, int cpu)
-{
-	int idx = MEM_CGROUP_ON_MOVE;
-
-	spin_lock(&memcg->pcp_counter_lock);
-	per_cpu(memcg->stat->count[idx], cpu) = memcg->nocpu_base.count[idx];
 	spin_unlock(&memcg->pcp_counter_lock);
 }
 
@@ -2089,11 +2082,8 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb,
 	struct memcg_stock_pcp *stock;
 	struct mem_cgroup *iter;
 
-	if ((action == CPU_ONLINE)) {
-		for_each_mem_cgroup(iter)
-			synchronize_mem_cgroup_on_move(iter, cpu);
+	if (action == CPU_ONLINE)
 		return NOTIFY_OK;
-	}
 
 	if ((action != CPU_DEAD) || action != CPU_DEAD_FROZEN)
 		return NOTIFY_OK;
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
  2012-02-17  9:25 ` [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
  2012-02-17  9:26 ` [PATCH 2/6] memcg: simplify move_account() check KAMEZAWA Hiroyuki
@ 2012-02-17  9:26 ` KAMEZAWA Hiroyuki
  2012-02-20  8:56   ` Johannes Weiner
  2012-02-17  9:27 ` [PATCH 4/6] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17  9:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org, Michal Hocko,
	Hugh Dickins, Greg Thelen, Ying Han

>From 62a96c625be0c30fc5828d88685b6873ed254060 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 2 Feb 2012 11:49:59 +0900
Subject: [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup.

PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
pc->mem_cgroup and page statistics accounting per memcg.
This lock helps to avoid the race but the race is very rare because moving
tasks between cgroup is not a usual job.
So, it seems using 1bit per page is too costly.

This patch changes this lock as per-memcg spinlock and removes PCG_MOVE_LOCK.

If smaller lock is required, we'll be able to add some hashes but
I'd like to start from this.

Changelog:
  - fixed to pass memcg as an argument rather than page_cgroup.
    and renamed from move_lock_page_cgroup() to move_lock_mem_cgroup()

Acked-by: Greg Thelen <gthelen@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   19 -------------------
 mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++----------
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 1060292..7a3af74 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -7,7 +7,6 @@ enum {
 	PCG_USED, /* this object is in use. */
 	PCG_MIGRATION, /* under page migration */
 	/* flags for mem_cgroup and file and I/O status */
-	PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
 	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
 	__NR_PCG_FLAGS,
 };
@@ -89,24 +88,6 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 	bit_spin_unlock(PCG_LOCK, &pc->flags);
 }
 
-static inline void move_lock_page_cgroup(struct page_cgroup *pc,
-	unsigned long *flags)
-{
-	/*
-	 * We know updates to pc->flags of page cache's stats are from both of
-	 * usual context or IRQ context. Disable IRQ to avoid deadlock.
-	 */
-	local_irq_save(*flags);
-	bit_spin_lock(PCG_MOVE_LOCK, &pc->flags);
-}
-
-static inline void move_unlock_page_cgroup(struct page_cgroup *pc,
-	unsigned long *flags)
-{
-	bit_spin_unlock(PCG_MOVE_LOCK, &pc->flags);
-	local_irq_restore(*flags);
-}
-
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct page_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9c94389..7065028 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -280,6 +280,8 @@ struct mem_cgroup {
 	 * set > 0 if pages under this cgroup are moving to other cgroup.
 	 */
 	atomic_t	moving_account;
+	/* taken only while moving_account > 0 */
+	spinlock_t	move_lock;
 	/*
 	 * percpu counter.
 	 */
@@ -1343,6 +1345,24 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg)
 	return false;
 }
 
+/*
+ * Take this lock when
+ * - a code tries to modify page's memcg while it's USED.
+ * - a code tries to modify page state accounting in a memcg.
+ * see mem_cgroup_stealed(), too.
+ */
+static void move_lock_mem_cgroup(struct mem_cgroup *memcg,
+				  unsigned long *flags)
+{
+	spin_lock_irqsave(&memcg->move_lock, *flags);
+}
+
+static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
+				unsigned long *flags)
+{
+	spin_unlock_irqrestore(&memcg->move_lock, *flags);
+}
+
 /**
  * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode.
  * @memcg: The memory cgroup that went over limit
@@ -1867,7 +1887,7 @@ void mem_cgroup_update_page_stat(struct page *page,
 
 	if (mem_cgroup_disabled())
 		return;
-
+again:
 	rcu_read_lock();
 	memcg = pc->mem_cgroup;
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
@@ -1875,11 +1895,13 @@ void mem_cgroup_update_page_stat(struct page *page,
 	/* pc->mem_cgroup is unstable ? */
 	if (unlikely(mem_cgroup_stealed(memcg))) {
 		/* take a lock against to access pc->mem_cgroup */
-		move_lock_page_cgroup(pc, &flags);
+		move_lock_mem_cgroup(memcg, &flags);
+		if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
+			move_unlock_mem_cgroup(memcg, &flags);
+			rcu_read_unlock();
+			goto again;
+		}
 		need_unlock = true;
-		memcg = pc->mem_cgroup;
-		if (!memcg || !PageCgroupUsed(pc))
-			goto out;
 	}
 
 	switch (idx) {
@@ -1898,7 +1920,7 @@ void mem_cgroup_update_page_stat(struct page *page,
 
 out:
 	if (unlikely(need_unlock))
-		move_unlock_page_cgroup(pc, &flags);
+		move_unlock_mem_cgroup(memcg, &flags);
 	rcu_read_unlock();
 }
 
@@ -2440,8 +2462,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MOVE_LOCK) |\
-			(1 << PCG_MIGRATION))
+#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -2512,7 +2533,7 @@ static int mem_cgroup_move_account(struct page *page,
 	if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
 		goto unlock;
 
-	move_lock_page_cgroup(pc, &flags);
+	move_lock_mem_cgroup(from, &flags);
 
 	if (PageCgroupFileMapped(pc)) {
 		/* Update mapped_file data for mem_cgroup */
@@ -2536,7 +2557,7 @@ static int mem_cgroup_move_account(struct page *page,
 	 * guaranteed that "to" is never removed. So, we don't check rmdir
 	 * status here.
 	 */
-	move_unlock_page_cgroup(pc, &flags);
+	move_unlock_mem_cgroup(from, &flags);
 	ret = 0;
 unlock:
 	unlock_page_cgroup(pc);
@@ -4928,6 +4949,7 @@ mem_cgroup_create(struct cgroup *cont)
 	atomic_set(&memcg->refcnt, 1);
 	memcg->move_charge_at_immigrate = 0;
 	mutex_init(&memcg->thresholds_lock);
+	spin_lock_init(&memcg->move_lock);
 	return &memcg->css;
 free_out:
 	__mem_cgroup_free(memcg);
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] memcg: use new logic for page stat accounting.
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2012-02-17  9:26 ` [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
@ 2012-02-17  9:27 ` KAMEZAWA Hiroyuki
  2012-02-28 12:22   ` Johannes Weiner
  2012-02-17  9:28 ` [PATCH 5/6] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17  9:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org, Michal Hocko,
	Hugh Dickins, Greg Thelen, Ying Han

>From 5bf592d432e4552db74e94a575afcd83aa652a9d Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 2 Feb 2012 11:49:59 +0900
Subject: [PATCH 4/6] memcg: use new logic for page stat accounting.

Now, page-stat-per-memcg is recorded into per page_cgroup flag by
duplicating page's status into the flag. The reason is that memcg
has a feature to move a page from a group to another group and we
have race between "move" and "page stat accounting",

Under current logic, assume CPU-A and CPU-B. CPU-A does "move"
and CPU-B does "page stat accounting".

When CPU-A goes 1st,

            CPU-A                           CPU-B
                                    update "struct page" info.
    move_lock_mem_cgroup(memcg)
    see pc->flags
    copy page stat to new group
    overwrite pc->mem_cgroup.
    move_unlock_mem_cgroup(memcg)
                                    move_lock_mem_cgroup(mem)
                                    set pc->flags
                                    update page stat accounting
                                    move_unlock_mem_cgroup(mem)

stat accounting is guarded by move_lock_mem_cgroup() and "move"
logic (CPU-A) doesn't see changes in "struct page" information.

But it's costly to have the same information both in 'struct page' and
'struct page_cgroup'. And, there is a potential problem.

For example, assume we have PG_dirty accounting in memcg.
PG_..is a flag for struct page.
PCG_ is a flag for struct page_cgroup.
(This is just an example. The same problem can be found in any
 kind of page stat accounting.)

	  CPU-A                               CPU-B
      TestSet PG_dirty
      (delay)                        TestClear PG_dirty
                                     if (TestClear(PCG_dirty))
                                          memcg->nr_dirty--
      if (TestSet(PCG_dirty))
          memcg->nr_dirty++

Here, memcg->nr_dirty = +1, this is wrong.
This race was reported by  Greg Thelen <gthelen@google.com>.
Now, only FILE_MAPPED is supported but fortunately, it's serialized by
page table lock and this is not real bug, _now_,

If this potential problem is caused by having duplicated information in
struct page and struct page_cgroup, we may be able to fix this by using
original 'struct page' information.
But we'll have a problem in "move account"

Assume we use only PG_dirty.

         CPU-A                   CPU-B
    TestSet PG_dirty
    (delay)                    move_lock_mem_cgroup()
                               if (PageDirty(page))
                                      new_memcg->nr_dirty++
                               pc->mem_cgroup = new_memcg;
                               move_unlock_mem_cgroup()
    move_lock_mem_cgroup()
    memcg = pc->mem_cgroup
    new_memcg->nr_dirty++

accounting information may be double-counted. This was original
reason to have PCG_xxx flags but it seems PCG_xxx has another problem.

I think we need a bigger lock as

     move_lock_mem_cgroup(page)
     TestSetPageDirty(page)
     update page stats (without any checks)
     move_unlock_mem_cgroup(page)

This fixes both of problems and we don't have to duplicate page flag
into page_cgroup. Please note: move_lock_mem_cgroup() is held
only when there are possibility of "account move" under the system.
So, in most path, status update will go without atomic locks.

This patch introduce mem_cgroup_begin_update_page_stat() and
mem_cgroup_end_update_page_stat() both should be called at modifying
'struct page' information if memcg takes care of it. as

     mem_cgroup_begin_update_page_stat()
     modify page information
     mem_cgroup_update_page_stat()
     => never check any 'struct page' info, just update counters.
     mem_cgroup_end_update_page_stat().

This patch is slow because we need to call begin_update_page_stat()/
end_update_page_stat() regardless of accounted will be changed or not.
A following patch adds an easy optimization and reduces the cost.

Changes since v4
 - removed unused argument *lock
 - added more comments.
Changes since v3
 - fixed typos

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |   35 ++++++++++++++++++++++++
 mm/memcontrol.c            |   64 ++++++++++++++++++++++++++++++-------------
 mm/rmap.c                  |   15 +++++++++-
 3 files changed, 92 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bf4e1f4..75b6307 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -141,6 +141,31 @@ static inline bool mem_cgroup_disabled(void)
 	return false;
 }
 
+void __mem_cgroup_begin_update_page_stat(struct page *page,
+					bool *lock, unsigned long *flags);
+
+static inline void mem_cgroup_begin_update_page_stat(struct page *page,
+					bool *lock, unsigned long *flags)
+{
+	if (mem_cgroup_disabled())
+		return;
+	rcu_read_lock();
+	*lock = false;
+	return __mem_cgroup_begin_update_page_stat(page, lock, flags);
+}
+
+void __mem_cgroup_end_update_page_stat(struct page *page,
+				unsigned long *flags);
+static inline void mem_cgroup_end_update_page_stat(struct page *page,
+					bool *lock, unsigned long *flags)
+{
+	if (mem_cgroup_disabled())
+		return;
+	if (*lock)
+		__mem_cgroup_end_update_page_stat(page, flags);
+	rcu_read_unlock();
+}
+
 void mem_cgroup_update_page_stat(struct page *page,
 				 enum mem_cgroup_page_stat_item idx,
 				 int val);
@@ -356,6 +381,16 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 {
 }
 
+static inline void mem_cgroup_begin_update_page_stat(struct page *page,
+					bool *lock, unsigned long *flags)
+{
+}
+
+static inline void mem_cgroup_end_update_page_stat(struct page *page,
+					bool *lock, unsigned long *flags)
+{
+}
+
 static inline void mem_cgroup_inc_page_stat(struct page *page,
 					    enum mem_cgroup_page_stat_item idx)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7065028..86bdde0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1877,32 +1877,61 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
  * If there is, we take a lock.
  */
 
+void __mem_cgroup_begin_update_page_stat(struct page *page,
+				bool *lock, unsigned long *flags)
+{
+	struct mem_cgroup *memcg;
+	struct page_cgroup *pc;
+
+	pc = lookup_page_cgroup(page);
+again:
+	memcg = pc->mem_cgroup;
+	if (unlikely(!memcg || !PageCgroupUsed(pc)))
+		return;
+	/*
+	 * If this memory cgroup is not under account moving, we don't
+	 * need to take move_lock_page_cgroup(). Because we already hold
+	 * rcu_read_lock(), any calls to move_account will be delayed until
+	 * rcu_read_unlock() if mem_cgroup_stealed() == true.
+	 */
+	if (!mem_cgroup_stealed(memcg))
+		return;
+
+	move_lock_mem_cgroup(memcg, flags);
+	if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
+		move_unlock_mem_cgroup(memcg, flags);
+		goto again;
+	}
+	*lock = true;
+}
+
+void __mem_cgroup_end_update_page_stat(struct page *page,
+				unsigned long *flags)
+{
+	struct page_cgroup *pc = lookup_page_cgroup(page);
+
+	/*
+	 * It's guaranteed that pc->mem_cgroup never changes while
+	 * lock is held because a routine modifies pc->mem_cgroup
+	 * should take move_lock_page_cgroup().
+	 */
+	move_unlock_mem_cgroup(pc->mem_cgroup, flags);
+}
+
+
 void mem_cgroup_update_page_stat(struct page *page,
 				 enum mem_cgroup_page_stat_item idx, int val)
 {
 	struct mem_cgroup *memcg;
 	struct page_cgroup *pc = lookup_page_cgroup(page);
-	bool need_unlock = false;
 	unsigned long uninitialized_var(flags);
 
 	if (mem_cgroup_disabled())
 		return;
-again:
-	rcu_read_lock();
+
 	memcg = pc->mem_cgroup;
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
-		goto out;
-	/* pc->mem_cgroup is unstable ? */
-	if (unlikely(mem_cgroup_stealed(memcg))) {
-		/* take a lock against to access pc->mem_cgroup */
-		move_lock_mem_cgroup(memcg, &flags);
-		if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
-			move_unlock_mem_cgroup(memcg, &flags);
-			rcu_read_unlock();
-			goto again;
-		}
-		need_unlock = true;
-	}
+		return;
 
 	switch (idx) {
 	case MEMCG_NR_FILE_MAPPED:
@@ -1917,11 +1946,6 @@ again:
 	}
 
 	this_cpu_add(memcg->stat->count[idx], val);
-
-out:
-	if (unlikely(need_unlock))
-		move_unlock_mem_cgroup(memcg, &flags);
-	rcu_read_unlock();
 }
 
 /*
diff --git a/mm/rmap.c b/mm/rmap.c
index aa547d4..dd193db 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1150,10 +1150,15 @@ void page_add_new_anon_rmap(struct page *page,
  */
 void page_add_file_rmap(struct page *page)
 {
+	bool locked;
+	unsigned long flags;
+
+	mem_cgroup_begin_update_page_stat(page, &locked, &flags);
 	if (atomic_inc_and_test(&page->_mapcount)) {
 		__inc_zone_page_state(page, NR_FILE_MAPPED);
 		mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
 	}
+	mem_cgroup_end_update_page_stat(page, &locked, &flags);
 }
 
 /**
@@ -1164,9 +1169,13 @@ void page_add_file_rmap(struct page *page)
  */
 void page_remove_rmap(struct page *page)
 {
+	bool locked;
+	unsigned long flags;
+
+	mem_cgroup_begin_update_page_stat(page, &locked, &flags);
 	/* page still mapped by someone else? */
 	if (!atomic_add_negative(-1, &page->_mapcount))
-		return;
+		goto out;
 
 	/*
 	 * Now that the last pte has gone, s390 must transfer dirty
@@ -1183,7 +1192,7 @@ void page_remove_rmap(struct page *page)
 	 * and not charged by memcg for now.
 	 */
 	if (unlikely(PageHuge(page)))
-		return;
+		goto out;
 	if (PageAnon(page)) {
 		mem_cgroup_uncharge_page(page);
 		if (!PageTransHuge(page))
@@ -1204,6 +1213,8 @@ void page_remove_rmap(struct page *page)
 	 * Leaving it set also helps swapoff to reinstate ptes
 	 * faster for those pages still in swapcache.
 	 */
+out:
+	mem_cgroup_end_update_page_stat(page, &locked, &flags);
 }
 
 /*
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] memcg: remove PCG_FILE_MAPPED
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2012-02-17  9:27 ` [PATCH 4/6] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
@ 2012-02-17  9:28 ` KAMEZAWA Hiroyuki
  2012-02-18 13:39   ` Johannes Weiner
  2012-02-17  9:28 ` [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17  9:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org, Michal Hocko,
	Hugh Dickins, Greg Thelen, Ying Han

>From a19a8eae9c2a3dabb25a50c7a1b2e3a183935260 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 2 Feb 2012 15:02:18 +0900
Subject: [PATCH 5/6] memcg: remove PCG_FILE_MAPPED

with new lock scheme for updating memcg's page stat, we don't need
a flag PCG_FILE_MAPPED which was duplicated information of page_mapped().

Acked-by: Greg Thelen <gthelen@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    6 ------
 mm/memcontrol.c             |    6 +-----
 2 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7a3af74..a88cdba 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -6,8 +6,6 @@ enum {
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
 	PCG_USED, /* this object is in use. */
 	PCG_MIGRATION, /* under page migration */
-	/* flags for mem_cgroup and file and I/O status */
-	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
 	__NR_PCG_FLAGS,
 };
 
@@ -66,10 +64,6 @@ TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 SETPCGFLAG(Used, USED)
 
-SETPCGFLAG(FileMapped, FILE_MAPPED)
-CLEARPCGFLAG(FileMapped, FILE_MAPPED)
-TESTPCGFLAG(FileMapped, FILE_MAPPED)
-
 SETPCGFLAG(Migration, MIGRATION)
 CLEARPCGFLAG(Migration, MIGRATION)
 TESTPCGFLAG(Migration, MIGRATION)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 86bdde0..e19cffb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1935,10 +1935,6 @@ void mem_cgroup_update_page_stat(struct page *page,
 
 	switch (idx) {
 	case MEMCG_NR_FILE_MAPPED:
-		if (val > 0)
-			SetPageCgroupFileMapped(pc);
-		else if (!page_mapped(page))
-			ClearPageCgroupFileMapped(pc);
 		idx = MEM_CGROUP_STAT_FILE_MAPPED;
 		break;
 	default:
@@ -2559,7 +2555,7 @@ static int mem_cgroup_move_account(struct page *page,
 
 	move_lock_mem_cgroup(from, &flags);
 
-	if (PageCgroupFileMapped(pc)) {
+	if (page_mapped(page)) {
 		/* Update mapped_file data for mem_cgroup */
 		preempt_disable();
 		__this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat()
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
                   ` (4 preceding siblings ...)
  2012-02-17  9:28 ` [PATCH 5/6] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
@ 2012-02-17  9:28 ` KAMEZAWA Hiroyuki
  2012-02-28 12:25   ` Johannes Weiner
  2012-02-17 10:04 ` [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
  2012-02-17 21:45 ` Andrew Morton
  7 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17  9:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org, Michal Hocko,
	Hugh Dickins, Greg Thelen, Ying Han

>From 07d3ce332ee4bc1eaef4b8fb2019b0c06bd7afb1 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Mon, 6 Feb 2012 12:14:47 +0900
Subject: [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat()

mem_cgroup_begin_update_page_stat() should be very fast because
it's called very frequently. Now, it needs to look up page_cgroup
and its memcg....this is slow.

This patch adds a global variable to check "any memcg is moving or not".
With this, the caller doesn't need to visit page_cgroup and memcg.

Here is a test result. A test program makes page faults onto a file,
MAP_SHARED and makes each page's page_mapcount(page) > 1, and free
the range by madvise() and page fault again.  This program causes
26214400 times of page fault onto a file(size was 1G.) and shows
shows the cost of mem_cgroup_begin_update_page_stat().

Before this patch for mem_cgroup_begin_update_page_stat()
[kamezawa@bluextal test]$ time ./mmap 1G

real    0m21.765s
user    0m5.999s
sys     0m15.434s

    27.46%     mmap  mmap               [.] reader
    21.15%     mmap  [kernel.kallsyms]  [k] page_fault
     9.17%     mmap  [kernel.kallsyms]  [k] filemap_fault
     2.96%     mmap  [kernel.kallsyms]  [k] __do_fault
     2.83%     mmap  [kernel.kallsyms]  [k] __mem_cgroup_begin_update_page_stat

After this patch
[root@bluextal test]# time ./mmap 1G

real    0m21.373s
user    0m6.113s
sys     0m15.016s

In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away.

Note: we may be able to remove this optimization in future if
      we can get pointer to memcg directly from struct page.

Acked-by: Greg Thelen <gthelen@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Changelog since v5
 - fixed patch description.
---
 include/linux/memcontrol.h |    5 +++--
 mm/memcontrol.c            |    7 ++++++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 75b6307..0710116 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -143,7 +143,7 @@ static inline bool mem_cgroup_disabled(void)
 
 void __mem_cgroup_begin_update_page_stat(struct page *page,
 					bool *lock, unsigned long *flags);
-
+extern atomic_t memcg_moving;
 static inline void mem_cgroup_begin_update_page_stat(struct page *page,
 					bool *lock, unsigned long *flags)
 {
@@ -151,7 +151,8 @@ static inline void mem_cgroup_begin_update_page_stat(struct page *page,
 		return;
 	rcu_read_lock();
 	*lock = false;
-	return __mem_cgroup_begin_update_page_stat(page, lock, flags);
+	if (atomic_read(&memcg_moving))
+		return __mem_cgroup_begin_update_page_stat(page, lock, flags);
 }
 
 void __mem_cgroup_end_update_page_stat(struct page *page,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e19cffb..7dbaa3f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1273,8 +1273,11 @@ int mem_cgroup_swappiness(struct mem_cgroup *memcg)
  *         start move here.
  */
 
+/* for quick checking without looking up memcg */
+atomic_t memcg_moving __read_mostly;
 static void mem_cgroup_start_move(struct mem_cgroup *memcg)
 {
+	atomic_inc(&memcg_moving);
 	atomic_inc(&memcg->moving_account);
 	synchronize_rcu();
 }
@@ -1285,8 +1288,10 @@ static void mem_cgroup_end_move(struct mem_cgroup *memcg)
 	 * Now, mem_cgroup_clear_mc() may call this function with NULL.
 	 * We check NULL in callee rather than caller.
 	 */
-	if (memcg)
+	if (memcg) {
+		atomic_dec(&memcg_moving);
 		atomic_dec(&memcg->moving_account);
+	}
 }
 
 /*
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] page cgroup diet v5
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
                   ` (5 preceding siblings ...)
  2012-02-17  9:28 ` [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
@ 2012-02-17 10:04 ` KAMEZAWA Hiroyuki
  2012-02-17 21:45 ` Andrew Morton
  7 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-17 10:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org, Michal Hocko,
	Hugh Dickins, Greg Thelen, Ying Han

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

On Fri, 17 Feb 2012 18:24:26 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> 
> This patch set is for removing 2 flags PCG_FILE_MAPPED and PCG_MOVE_LOCK on
> page_cgroup->flags. After this, page_cgroup has only 3bits of flags.
> And, this set introduces a new method to update page status accounting per memcg.
> With it, we don't have to add new flags onto page_cgroup if 'struct page' has
> information. This will be good for avoiding a new flag for page_cgroup.
> 
> Fixed pointed out parts.
>  - added more comments
>  - fixed texts
>  - removed redundant arguments.
> 
> Passed some tests on 3.3.0-rc3-next-20120216.
> 

Here is a micro benchmark test before/after this series.
mmap 1G bytes twice and repeat fault->drop repeatedly. (test program is attached)

== Before == 3 runs after 1st run
[root@bluextal test]# time ./mmap 1G

real    0m21.053s
user    0m6.046s
sys     0m14.743s
[root@bluextal test]# time ./mmap 1G

real    0m21.302s
user    0m6.027s
sys     0m14.979s
[root@bluextal test]# time ./mmap 1G

real    0m21.061s
user    0m6.020s
sys     0m14.722s

== After == 3 runs after 1st run
[root@bluextal test]# time ./mmap 1G

real    0m20.969s
user    0m5.960s
sys     0m14.777s
[root@bluextal test]# time ./mmap 1G

real    0m20.968s
user    0m6.069s
sys     0m14.650s
[root@bluextal test]# time ./mmap 1G

real    0m21.164s
user    0m6.152s
sys     0m14.707s


I think there is no regression.


Thanks,
-Kame




[-- Attachment #2: mmap.c --]
[-- Type: text/x-csrc, Size: 818 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>

void reader(int fd, int size)
{
	int i, off, x;
	char *addr;

	addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
	for (i = 0; i < 100; i++) {
		for(off = 0; off < size; off += 4096) {
			x += *(addr + off);
		}
		madvise(addr, size, MADV_DONTNEED);
	}
}

int main(int argc, char *argv[])
{
	int fd;
	char *addr, *c;
	unsigned long size;
	struct stat statbuf;

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("cannot open file");
		return 1;
	}

	if (fstat(fd, &statbuf)) {
		perror("fstat failed");
		return 1;
	}
	size = statbuf.st_size;
	/* mmap in 2 place. */
	addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
	mlock(addr, size);
	reader(fd, size);
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] page cgroup diet v5
  2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
                   ` (6 preceding siblings ...)
  2012-02-17 10:04 ` [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
@ 2012-02-17 21:45 ` Andrew Morton
  7 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2012-02-17 21:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hannes@cmpxchg.org, Michal Hocko, Hugh Dickins, Greg Thelen,
	Ying Han

On Fri, 17 Feb 2012 18:24:26 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> 
> This patch set is for removing 2 flags PCG_FILE_MAPPED and PCG_MOVE_LOCK on
> page_cgroup->flags. After this, page_cgroup has only 3bits of flags.
> And, this set introduces a new method to update page status accounting per memcg.
> With it, we don't have to add new flags onto page_cgroup if 'struct page' has
> information. This will be good for avoiding a new flag for page_cgroup.
> 
> Fixed pointed out parts.
>  - added more comments
>  - fixed texts
>  - removed redundant arguments.
> 

I tweaked a few things here.  Renamed "bool lock;" to "bool locked" in
several places.  Also the void-returning
mem_cgroup_begin_update_page_stat() was doing an explicit return which
is OK C but pointless and misleading.


Also, this has been bugging me for a while ;)

From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/memcontrol.c: s/stealed/stolen/

A grammatical fix.

Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff -puN mm/memcontrol.c~a mm/memcontrol.c
--- a/mm/memcontrol.c~a
+++ a/mm/memcontrol.c
@@ -1299,8 +1299,8 @@ static void mem_cgroup_end_move(struct m
 /*
  * 2 routines for checking "mem" is under move_account() or not.
  *
- * mem_cgroup_stealed() - checking a cgroup is mc.from or not. This is used
- *			  for avoiding race in accounting. If true,
+ * mem_cgroup_stolen() -  checking whether a cgroup is mc.from or not. This
+ *			  is used for avoiding races in accounting.  If true,
  *			  pc->mem_cgroup may be overwritten.
  *
  * mem_cgroup_under_move() - checking a cgroup is mc.from or mc.to or
@@ -1308,7 +1308,7 @@ static void mem_cgroup_end_move(struct m
  *			  waiting at hith-memory prressure caused by "move".
  */
 
-static bool mem_cgroup_stealed(struct mem_cgroup *memcg)
+static bool mem_cgroup_stolen(struct mem_cgroup *memcg)
 {
 	VM_BUG_ON(!rcu_read_lock_held());
 	return atomic_read(&memcg->moving_account) > 0;
@@ -1356,7 +1356,7 @@ static bool mem_cgroup_wait_acct_move(st
  * Take this lock when
  * - a code tries to modify page's memcg while it's USED.
  * - a code tries to modify page state accounting in a memcg.
- * see mem_cgroup_stealed(), too.
+ * see mem_cgroup_stolen(), too.
  */
 static void move_lock_mem_cgroup(struct mem_cgroup *memcg,
 				  unsigned long *flags)
@@ -1899,9 +1899,9 @@ again:
 	 * If this memory cgroup is not under account moving, we don't
 	 * need to take move_lock_page_cgroup(). Because we already hold
 	 * rcu_read_lock(), any calls to move_account will be delayed until
-	 * rcu_read_unlock() if mem_cgroup_stealed() == true.
+	 * rcu_read_unlock() if mem_cgroup_stolen() == true.
 	 */
-	if (!mem_cgroup_stealed(memcg))
+	if (!mem_cgroup_stolen(memcg))
 		return;
 
 	move_lock_mem_cgroup(memcg, flags);
_


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] memcg: remove PCG_FILE_MAPPED
  2012-02-17  9:28 ` [PATCH 5/6] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
@ 2012-02-18 13:39   ` Johannes Weiner
  2012-02-18 14:43     ` Hillf Danton
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2012-02-18 13:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Michal Hocko, Hugh Dickins,
	Greg Thelen, Ying Han

On Fri, Feb 17, 2012 at 06:28:18PM +0900, KAMEZAWA Hiroyuki wrote:
> >From a19a8eae9c2a3dabb25a50c7a1b2e3a183935260 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 15:02:18 +0900
> Subject: [PATCH 5/6] memcg: remove PCG_FILE_MAPPED
> 
> with new lock scheme for updating memcg's page stat, we don't need
> a flag PCG_FILE_MAPPED which was duplicated information of page_mapped().
> 
> Acked-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |    6 ------
>  mm/memcontrol.c             |    6 +-----
>  2 files changed, 1 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7a3af74..a88cdba 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -6,8 +6,6 @@ enum {
>  	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>  	PCG_USED, /* this object is in use. */
>  	PCG_MIGRATION, /* under page migration */
> -	/* flags for mem_cgroup and file and I/O status */
> -	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
>  	__NR_PCG_FLAGS,
>  };
>  
> @@ -66,10 +64,6 @@ TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>  SETPCGFLAG(Used, USED)
>  
> -SETPCGFLAG(FileMapped, FILE_MAPPED)
> -CLEARPCGFLAG(FileMapped, FILE_MAPPED)
> -TESTPCGFLAG(FileMapped, FILE_MAPPED)
> -
>  SETPCGFLAG(Migration, MIGRATION)
>  CLEARPCGFLAG(Migration, MIGRATION)
>  TESTPCGFLAG(Migration, MIGRATION)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 86bdde0..e19cffb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1935,10 +1935,6 @@ void mem_cgroup_update_page_stat(struct page *page,
>  
>  	switch (idx) {
>  	case MEMCG_NR_FILE_MAPPED:
> -		if (val > 0)
> -			SetPageCgroupFileMapped(pc);
> -		else if (!page_mapped(page))
> -			ClearPageCgroupFileMapped(pc);
>  		idx = MEM_CGROUP_STAT_FILE_MAPPED;
>  		break;
>  	default:
> @@ -2559,7 +2555,7 @@ static int mem_cgroup_move_account(struct page *page,
>  
>  	move_lock_mem_cgroup(from, &flags);
>  
> -	if (PageCgroupFileMapped(pc)) {
> +	if (page_mapped(page)) {

As opposed to update_page_stat(), this runs against all types of
pages, so I think it should be

	if (!PageAnon(page) && page_mapped(page))

instead.

>  		/* Update mapped_file data for mem_cgroup */
>  		preempt_disable();
>  		__this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] memcg: remove PCG_FILE_MAPPED
  2012-02-18 13:39   ` Johannes Weiner
@ 2012-02-18 14:43     ` Hillf Danton
  2012-02-19 23:52       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 18+ messages in thread
From: Hillf Danton @ 2012-02-18 14:43 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	Michal Hocko, Hugh Dickins, Greg Thelen

On Sat, Feb 18, 2012 at 9:39 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Fri, Feb 17, 2012 at 06:28:18PM +0900, KAMEZAWA Hiroyuki wrote:
>> @@ -2559,7 +2555,7 @@ static int mem_cgroup_move_account(struct page *page,
>>
>>       move_lock_mem_cgroup(from, &flags);
>>
>> -     if (PageCgroupFileMapped(pc)) {
>> +     if (page_mapped(page)) {
>
> As opposed to update_page_stat(), this runs against all types of
> pages, so I think it should be
>
>        if (!PageAnon(page) && page_mapped(page))
>
> instead.
>
Perhaps the following helper or similar needed,
along with page_mapped()

static inline bool page_is_file_mapping(struct page *page)
{
	struct address_space *mapping = page_mapping(page);

	return mapping && mapping != &swapper_space &&
		((unsigned long)mapping & PAGE_MAPPING_FLAGS) == 0;
}

>>               /* Update mapped_file data for mem_cgroup */
>>               preempt_disable();
>>               __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] memcg: remove PCG_FILE_MAPPED
  2012-02-18 14:43     ` Hillf Danton
@ 2012-02-19 23:52       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-19 23:52 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Johannes Weiner, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Michal Hocko, Hugh Dickins,
	Greg Thelen

On Sat, 18 Feb 2012 22:43:58 +0800
Hillf Danton <dhillf@gmail.com> wrote:

> On Sat, Feb 18, 2012 at 9:39 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Fri, Feb 17, 2012 at 06:28:18PM +0900, KAMEZAWA Hiroyuki wrote:
> >> @@ -2559,7 +2555,7 @@ static int mem_cgroup_move_account(struct page *page,
> >>
> >>       move_lock_mem_cgroup(from, &flags);
> >>
> >> -     if (PageCgroupFileMapped(pc)) {
> >> +     if (page_mapped(page)) {
> >
> > As opposed to update_page_stat(), this runs against all types of
> > pages, so I think it should be
> >
> >        if (!PageAnon(page) && page_mapped(page))
> >
> > instead.
> >
> Perhaps the following helper or similar needed,
> along with page_mapped()
> 
> static inline bool page_is_file_mapping(struct page *page)
> {
> 	struct address_space *mapping = page_mapping(page);
> 
> 	return mapping && mapping != &swapper_space &&
> 		((unsigned long)mapping & PAGE_MAPPING_FLAGS) == 0;
> }
> 

Ah, thank you. I'll post a fix soon.

Ragard,
-Kame


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  2012-02-17  9:25 ` [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
@ 2012-02-20  8:54   ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2012-02-20  8:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Michal Hocko, Hugh Dickins,
	Greg Thelen, Ying Han

On Fri, Feb 17, 2012 at 06:25:35PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 7075e575cc6ee383f8fd161ba44c44a60c295d5f Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 12:05:41 +0900
> Subject: [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
> 
> >From the log, I guess EXPORT was for preparing dirty accounting.
> But _now_, we don't need to export this. Remove this for now.
> 
> Reviewed-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] memcg: simplify move_account() check.
  2012-02-17  9:26 ` [PATCH 2/6] memcg: simplify move_account() check KAMEZAWA Hiroyuki
@ 2012-02-20  8:55   ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2012-02-20  8:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Michal Hocko, Hugh Dickins,
	Greg Thelen, Ying Han

On Fri, Feb 17, 2012 at 06:26:12PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 3b6620772d7fd7b2126d5253eafb6afaf4ed6e34 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 10:02:39 +0900
> Subject: [PATCH 2/6] memcg: simplify move_account() check.
> 
> In memcg, for avoiding take-lock-irq-off at accessing page_cgroup,
> a logic, flag + rcu_read_lock(), is used. This works as following
> 
>      CPU-A                     CPU-B
>                              rcu_read_lock()
>     set flag
>                              if(flag is set)
>                                    take heavy lock
>                              do job.
>     synchronize_rcu()        rcu_read_unlock()
>     take heavy lock.
> 
> In recent discussion, it's argued that using per-cpu value for this
> flag just complicates the code because 'set flag' is very rare.
> 
> This patch changes 'flag' implementation from percpu to atomic_t.
> This will be much simpler.
> 
> Changelog v5.
>  - removed redundant ().
>  - updated patch description.
> 
> Changelog: v4
>  - fixed many typos.
>  - fixed return value to be bool
>  - add comments.
> Changelog: v3
>  - this is a new patch since v3.
> 
> Acked-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Much better!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup
  2012-02-17  9:26 ` [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
@ 2012-02-20  8:56   ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2012-02-20  8:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Michal Hocko, Hugh Dickins,
	Greg Thelen, Ying Han

On Fri, Feb 17, 2012 at 06:26:51PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 62a96c625be0c30fc5828d88685b6873ed254060 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 11:49:59 +0900
> Subject: [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup.
> 
> PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
> pc->mem_cgroup and page statistics accounting per memcg.
> This lock helps to avoid the race but the race is very rare because moving
> tasks between cgroup is not a usual job.
> So, it seems using 1bit per page is too costly.
> 
> This patch changes this lock as per-memcg spinlock and removes PCG_MOVE_LOCK.
> 
> If smaller lock is required, we'll be able to add some hashes but
> I'd like to start from this.
> 
> Changelog:
>   - fixed to pass memcg as an argument rather than page_cgroup.
>     and renamed from move_lock_page_cgroup() to move_lock_mem_cgroup()
> 
> Acked-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] memcg: use new logic for page stat accounting.
  2012-02-17  9:27 ` [PATCH 4/6] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
@ 2012-02-28 12:22   ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2012-02-28 12:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Michal Hocko, Hugh Dickins,
	Greg Thelen, Ying Han

On Fri, Feb 17, 2012 at 06:27:43PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 5bf592d432e4552db74e94a575afcd83aa652a9d Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 11:49:59 +0900
> Subject: [PATCH 4/6] memcg: use new logic for page stat accounting.
> 
> Now, page-stat-per-memcg is recorded into per page_cgroup flag by
> duplicating page's status into the flag. The reason is that memcg
> has a feature to move a page from a group to another group and we
> have race between "move" and "page stat accounting",
> 
> Under current logic, assume CPU-A and CPU-B. CPU-A does "move"
> and CPU-B does "page stat accounting".
> 
> When CPU-A goes 1st,
> 
>             CPU-A                           CPU-B
>                                     update "struct page" info.
>     move_lock_mem_cgroup(memcg)
>     see pc->flags
>     copy page stat to new group
>     overwrite pc->mem_cgroup.
>     move_unlock_mem_cgroup(memcg)
>                                     move_lock_mem_cgroup(mem)
>                                     set pc->flags
>                                     update page stat accounting
>                                     move_unlock_mem_cgroup(mem)
> 
> stat accounting is guarded by move_lock_mem_cgroup() and "move"
> logic (CPU-A) doesn't see changes in "struct page" information.
> 
> But it's costly to have the same information both in 'struct page' and
> 'struct page_cgroup'. And, there is a potential problem.
> 
> For example, assume we have PG_dirty accounting in memcg.
> PG_..is a flag for struct page.
> PCG_ is a flag for struct page_cgroup.
> (This is just an example. The same problem can be found in any
>  kind of page stat accounting.)
> 
> 	  CPU-A                               CPU-B
>       TestSet PG_dirty
>       (delay)                        TestClear PG_dirty
>                                      if (TestClear(PCG_dirty))
>                                           memcg->nr_dirty--
>       if (TestSet(PCG_dirty))
>           memcg->nr_dirty++
> 
> Here, memcg->nr_dirty = +1, this is wrong.
> This race was reported by  Greg Thelen <gthelen@google.com>.
> Now, only FILE_MAPPED is supported but fortunately, it's serialized by
> page table lock and this is not real bug, _now_,
> 
> If this potential problem is caused by having duplicated information in
> struct page and struct page_cgroup, we may be able to fix this by using
> original 'struct page' information.
> But we'll have a problem in "move account"
> 
> Assume we use only PG_dirty.
> 
>          CPU-A                   CPU-B
>     TestSet PG_dirty
>     (delay)                    move_lock_mem_cgroup()
>                                if (PageDirty(page))
>                                       new_memcg->nr_dirty++
>                                pc->mem_cgroup = new_memcg;
>                                move_unlock_mem_cgroup()
>     move_lock_mem_cgroup()
>     memcg = pc->mem_cgroup
>     new_memcg->nr_dirty++
> 
> accounting information may be double-counted. This was original
> reason to have PCG_xxx flags but it seems PCG_xxx has another problem.
> 
> I think we need a bigger lock as
> 
>      move_lock_mem_cgroup(page)
>      TestSetPageDirty(page)
>      update page stats (without any checks)
>      move_unlock_mem_cgroup(page)
> 
> This fixes both of problems and we don't have to duplicate page flag
> into page_cgroup. Please note: move_lock_mem_cgroup() is held
> only when there are possibility of "account move" under the system.
> So, in most path, status update will go without atomic locks.
> 
> This patch introduce mem_cgroup_begin_update_page_stat() and
> mem_cgroup_end_update_page_stat() both should be called at modifying
> 'struct page' information if memcg takes care of it. as
> 
>      mem_cgroup_begin_update_page_stat()
>      modify page information
>      mem_cgroup_update_page_stat()
>      => never check any 'struct page' info, just update counters.
>      mem_cgroup_end_update_page_stat().
> 
> This patch is slow because we need to call begin_update_page_stat()/
> end_update_page_stat() regardless of accounted will be changed or not.
> A following patch adds an easy optimization and reduces the cost.
> 
> Changes since v4
>  - removed unused argument *lock
>  - added more comments.
> Changes since v3
>  - fixed typos
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat()
  2012-02-17  9:28 ` [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
@ 2012-02-28 12:25   ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2012-02-28 12:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Michal Hocko, Hugh Dickins,
	Greg Thelen, Ying Han

On Fri, Feb 17, 2012 at 06:28:51PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 07d3ce332ee4bc1eaef4b8fb2019b0c06bd7afb1 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Mon, 6 Feb 2012 12:14:47 +0900
> Subject: [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat()
> 
> mem_cgroup_begin_update_page_stat() should be very fast because
> it's called very frequently. Now, it needs to look up page_cgroup
> and its memcg....this is slow.
> 
> This patch adds a global variable to check "any memcg is moving or not".
> With this, the caller doesn't need to visit page_cgroup and memcg.
> 
> Here is a test result. A test program makes page faults onto a file,
> MAP_SHARED and makes each page's page_mapcount(page) > 1, and free
> the range by madvise() and page fault again.  This program causes
> 26214400 times of page fault onto a file(size was 1G.) and shows
> shows the cost of mem_cgroup_begin_update_page_stat().
> 
> Before this patch for mem_cgroup_begin_update_page_stat()
> [kamezawa@bluextal test]$ time ./mmap 1G
> 
> real    0m21.765s
> user    0m5.999s
> sys     0m15.434s
> 
>     27.46%     mmap  mmap               [.] reader
>     21.15%     mmap  [kernel.kallsyms]  [k] page_fault
>      9.17%     mmap  [kernel.kallsyms]  [k] filemap_fault
>      2.96%     mmap  [kernel.kallsyms]  [k] __do_fault
>      2.83%     mmap  [kernel.kallsyms]  [k] __mem_cgroup_begin_update_page_stat
> 
> After this patch
> [root@bluextal test]# time ./mmap 1G
> 
> real    0m21.373s
> user    0m6.113s
> sys     0m15.016s
> 
> In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away.
> 
> Note: we may be able to remove this optimization in future if
>       we can get pointer to memcg directly from struct page.
> 
> Acked-by: Greg Thelen <gthelen@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-02-28 12:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-17  9:24 [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
2012-02-17  9:25 ` [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
2012-02-20  8:54   ` Johannes Weiner
2012-02-17  9:26 ` [PATCH 2/6] memcg: simplify move_account() check KAMEZAWA Hiroyuki
2012-02-20  8:55   ` Johannes Weiner
2012-02-17  9:26 ` [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
2012-02-20  8:56   ` Johannes Weiner
2012-02-17  9:27 ` [PATCH 4/6] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
2012-02-28 12:22   ` Johannes Weiner
2012-02-17  9:28 ` [PATCH 5/6] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
2012-02-18 13:39   ` Johannes Weiner
2012-02-18 14:43     ` Hillf Danton
2012-02-19 23:52       ` KAMEZAWA Hiroyuki
2012-02-17  9:28 ` [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
2012-02-28 12:25   ` Johannes Weiner
2012-02-17 10:04 ` [PATCH 0/6] page cgroup diet v5 KAMEZAWA Hiroyuki
2012-02-17 21:45 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2012-02-06 10:06 [RFC] [PATCH 0/6 v3] memcg: page cgroup diet KAMEZAWA Hiroyuki
2012-02-06 10:09 ` [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).