linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: "linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: [RFC] Shared page accounting for memory cgroup
Date: Tue, 29 Dec 2009 23:57:43 +0530	[thread overview]
Message-ID: <20091229182743.GB12533@balbir.in.ibm.com> (raw)

Hi, Everyone,

I've been working on heuristics for shared page accounting for the
memory cgroup. I've tested the patches by creating multiple cgroups
and running programs that share memory and observed the output.

Comments?


Add shared accounting to memcg

From: Balbir Singh <balbir@linux.vnet.ibm.com>

Currently there is no accurate way of estimating how many pages are
shared in a memory cgroup. The accurate way of accounting shared memory
is to

1. Either follow every page rmap and track number of users
2. Iterate through the pages and use _mapcount

We take an intermediate approach (suggested by Kamezawa), we sum up
the file and anon rss of the mm's belonging to the cgroup and then
subtract the values of anon rss and file mapped. This should give
us a good estimate of the pages being shared.

The shared statistic is called memory.shared_usage_in_bytes and
does not support hierarchical information, just the information
for the current cgroup.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 Documentation/cgroups/memory.txt |    6 +++++
 mm/memcontrol.c                  |   43 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 0 deletions(-)


diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index b871f25..c2c70c9 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -341,6 +341,12 @@ Note:
   - a cgroup which uses hierarchy and it has child cgroup.
   - a cgroup which uses hierarchy and not the root of hierarchy.
 
+5.4 shared_usage_in_bytes
+  This data lists the number of shared bytes. The data provided
+  provides an approximation based on the anon and file rss counts
+  of all the mm's belonging to the cgroup. The sum above is subtracted
+  from the count of rss and file mapped count maintained within the
+  memory cgroup statistics (see section 5.2).
 
 6. Hierarchy support
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 488b644..8e296be 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3052,6 +3052,45 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static u64 mem_cgroup_shared_read(struct cgroup *cgrp, struct cftype *cft)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct cgroup_iter it;
+	struct task_struct *tsk;
+	u64 total_rss = 0, shared;
+	struct mm_struct *mm;
+	s64 val;
+
+	cgroup_iter_start(cgrp, &it);
+	val = mem_cgroup_read_stat(&memcg->stat, MEM_CGROUP_STAT_RSS);
+	val += mem_cgroup_read_stat(&memcg->stat, MEM_CGROUP_STAT_FILE_MAPPED);
+	while ((tsk = cgroup_iter_next(cgrp, &it))) {
+		if (!thread_group_leader(tsk))
+			continue;
+		mm = tsk->mm;
+		/*
+		 * We can't use get_task_mm(), since mmput() its counterpart
+		 * can sleep. We know that mm can't become invalid since
+		 * we hold the css_set_lock (see cgroup_iter_start()).
+		 */
+		if (tsk->flags & PF_KTHREAD || !mm)
+			continue;
+		total_rss += get_mm_counter(mm, file_rss) +
+				get_mm_counter(mm, anon_rss);
+	}
+	cgroup_iter_end(cgrp, &it);
+
+	/*
+	 * We need to tolerate negative values due to the difference in
+	 * time of calculating total_rss and val, but the shared value
+	 * converges to the correct value quite soon depending on the changing
+	 * memory usage of the workload running in the memory cgroup.
+	 */
+	shared = total_rss - val;
+	shared = max_t(s64, 0, shared);
+	shared <<= PAGE_SHIFT;
+	return shared;
+}
 
 static struct cftype mem_cgroup_files[] = {
 	{
@@ -3101,6 +3140,10 @@ static struct cftype mem_cgroup_files[] = {
 		.read_u64 = mem_cgroup_swappiness_read,
 		.write_u64 = mem_cgroup_swappiness_write,
 	},
+	{
+		.name = "shared_usage_in_bytes",
+		.read_u64 = mem_cgroup_shared_read,
+	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2009-12-29 18:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-29 18:27 Balbir Singh [this message]
2010-01-03 23:51 ` [RFC] Shared page accounting for memory cgroup KAMEZAWA Hiroyuki
2010-01-04  0:07   ` Balbir Singh
2010-01-04  0:35     ` KAMEZAWA Hiroyuki
2010-01-04  0:50       ` Balbir Singh
2010-01-06  4:02         ` KAMEZAWA Hiroyuki
2010-01-06  7:01           ` Balbir Singh
2010-01-06  7:12             ` KAMEZAWA Hiroyuki
2010-01-07  7:15               ` Balbir Singh
2010-01-07  7:36                 ` KAMEZAWA Hiroyuki
2010-01-07  8:34                   ` Balbir Singh
2010-01-07  8:48                     ` KAMEZAWA Hiroyuki
2010-01-07  9:08                       ` KAMEZAWA Hiroyuki
2010-01-07  9:27                         ` Balbir Singh
2010-01-07 23:47                           ` KAMEZAWA Hiroyuki
2010-01-17 19:30                             ` Balbir Singh
2010-01-18  0:05                               ` KAMEZAWA Hiroyuki
2010-01-18  0:22                                 ` KAMEZAWA Hiroyuki
2010-01-18  0:49                               ` Daisuke Nishimura
2010-01-18  8:26                                 ` Balbir Singh
2010-01-19  1:22                                   ` Daisuke Nishimura
2010-01-19  1:49                                     ` Balbir Singh
2010-01-19  2:34                                       ` Daisuke Nishimura
2010-01-19  3:52                                         ` Balbir Singh
2010-01-20  4:09                                           ` Daisuke Nishimura
2010-01-20  7:15                                             ` Daisuke Nishimura
2010-01-20  7:43                                               ` KAMEZAWA Hiroyuki
2010-01-20  8:18                                               ` Balbir Singh
2010-01-20  8:17                                             ` Balbir Singh
2010-01-21  1:04                                               ` Daisuke Nishimura
2010-01-21  1:30                                                 ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091229182743.GB12533@balbir.in.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).