cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUGFIX][PATCH] add mem_cgroup_replace_page_cache.
@ 2011-12-06  3:39 KAMEZAWA Hiroyuki
       [not found] ` <20111206123923.1432ab52.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-06  3:39 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Miklos Szeredi,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, Michal Hocko,
	Hugh Dickins


Hm, is this too naive ? better idea is welcome. 
==
From 33638351c5cd28af9f47f9ab1c44eeb1f63d9964 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Date: Tue, 6 Dec 2011 12:32:32 +0900
Subject: [PATCH] memcg: add mem_cgroup_replace_page_cache() for fixing LRU issue.

commit ef6a3c6311 adds a function replace_page_cache_page(). This
function replaces a page in radix-tree with a new page.
At doing this, memory cgroup need to fix up the accounting information.
memcg need to check PCG_USED bit etc.

In some(many?) case, 'newpage' is on LRU before calling replace_page_cache().
So, memcg's LRU accounting information should be fixed, too.

This patch adds mem_cgroup_replace_page_cache() and removing old hooks.
In that function, old pages will be unaccounted without touching res_counter
and new page will be accounted to the memcg (of old page). At overwriting
pc->mem_cgroup of newpage, take zone->lru_lock and avoid race with
LRU handling.

Background:
  replace_page_cache_page() is called by FUSE code in its splice() handling.
  Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
  page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
  because rmdir() checks the whole LRU is empty and there is no account leak.
  If a page is on the other LRU than it should be, rmdir() will fail.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 include/linux/memcontrol.h |    6 ++++++
 mm/filemap.c               |   18 ++----------------
 mm/memcontrol.c            |   41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4b70e05..bd3b102 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -123,6 +123,8 @@ struct zone_reclaim_stat*
 mem_cgroup_get_reclaim_stat_from_page(struct page *page);
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 					struct task_struct *p);
+extern void mem_cgroup_replace_page_cache(struct page *oldpage,
+					struct page *newpage);
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
@@ -382,6 +384,10 @@ static inline
 void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 {
 }
+static inline void mem_cgroup_replace_page_cache(struct page *oldpage,
+				struct page *newpage)
+{
+}
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
diff --git a/mm/filemap.c b/mm/filemap.c
index a7b572b..4642211 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -393,24 +393,11 @@ EXPORT_SYMBOL(filemap_write_and_wait_range);
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 {
 	int error;
-	struct mem_cgroup *memcg = NULL;
 
 	VM_BUG_ON(!PageLocked(old));
 	VM_BUG_ON(!PageLocked(new));
 	VM_BUG_ON(new->mapping);
 
-	/*
-	 * This is not page migration, but prepare_migration and
-	 * end_migration does enough work for charge replacement.
-	 *
-	 * In the longer term we probably want a specialized function
-	 * for moving the charge from old to new in a more efficient
-	 * manner.
-	 */
-	error = mem_cgroup_prepare_migration(old, new, &memcg, gfp_mask);
-	if (error)
-		return error;
-
 	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
 	if (!error) {
 		struct address_space *mapping = old->mapping;
@@ -432,13 +419,12 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 		if (PageSwapBacked(new))
 			__inc_zone_page_state(new, NR_SHMEM);
 		spin_unlock_irq(&mapping->tree_lock);
+		/* mem_cgroup codes must not be called under tree_lock */
+		mem_cgroup_replace_page_cache(old, new);
 		radix_tree_preload_end();
 		if (freepage)
 			freepage(old);
 		page_cache_release(old);
-		mem_cgroup_end_migration(memcg, old, new, true);
-	} else {
-		mem_cgroup_end_migration(memcg, old, new, false);
 	}
 
 	return error;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8880a32..a9e92a6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3306,6 +3306,47 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
 	cgroup_release_and_wakeup_rmdir(&memcg->css);
 }
 
+/*
+ * At replace page cache, newpage is not under any memcg but it's on
+ * LRU. So, this function doesn't touch res_counter but handles LRU
+ * in correct way.
+ */
+void mem_cgroup_replace_page_cache(struct page *oldpage,
+				  struct page *newpage)
+{
+	struct mem_cgroup *memcg;
+	struct page_cgroup *pc;
+	struct zone *zone;
+	enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE;
+	unsigned long flags;
+
+	pc = lookup_page_cgroup(oldpage);
+	/* fix accounting on old pages */
+	lock_page_cgroup(pc);
+	memcg = pc->mem_cgroup;
+	mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), -1);
+	ClearPageCgroupUsed(pc);
+	unlock_page_cgroup(pc);
+
+	if (PageSwapBacked(oldpage))
+		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
+
+	zone = page_zone(newpage);
+	pc = lookup_page_cgroup(newpage);
+	/*
+	 * Even if newpage->mapping was NULL before starting replacement,
+	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
+	 * LRU while we overwrite pc->mem_cgroup.
+	 */
+	spin_lock_irqsave(&zone->lru_lock, flags);
+	if (PageLRU(newpage))
+		del_page_from_lru_list(zone, newpage, page_lru(newpage));
+	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type);
+	if (PageLRU(newpage))
+		add_page_to_lru_list(zone, newpage, page_lru(newpage));
+	spin_unlock_irqrestore(&zone->lru_lock, flags);
+}
+
 #ifdef CONFIG_DEBUG_VM
 static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
 {
-- 
1.7.4.1


--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-12-12  0:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-06  3:39 [BUGFIX][PATCH] add mem_cgroup_replace_page_cache KAMEZAWA Hiroyuki
     [not found] ` <20111206123923.1432ab52.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2011-12-06 10:12   ` [RFC][PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
2011-12-06 10:13     ` [RFC][PATCH 2/4] memcg: simplify corner case handling of LRU and charge races KAMEZAWA Hiroyuki
     [not found]     ` <20111206191211.3be32ccb.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2011-12-06 10:15       ` [RFC][PATCH 3/4] memcg: clear pc->mem_cgroup if necessary KAMEZAWA Hiroyuki
2011-12-06 10:17       ` [RFC][PATCH 4/4] memcg: new LRU rule KAMEZAWA Hiroyuki
2011-12-07  9:21   ` [BUGFIX][PATCH] add mem_cgroup_replace_page_cache Johannes Weiner
2011-12-07 11:14   ` Michal Hocko
     [not found]     ` <20111207111455.GA18249-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>
2011-12-08  7:18       ` [BUGFIX][PATCH v2] " KAMEZAWA Hiroyuki
     [not found]         ` <20111208161829.b6101de6.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2011-12-08  9:31           ` Michal Hocko
2011-12-09 20:37           ` Andrew Morton
2011-12-12  0:48             ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).