* [PATCH 1/12] maps4: add proportional set size accounting in smaps
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 2/12] maps4: From: Dave Hansen <haveblue@us.ibm.com> Matt Mackall
` (10 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Fengguang Wu <wfg@mail.ustc.edu.cn>
The "proportional set size" (PSS) of a process is the count of pages it has
in memory, where each page is divided by the number of processes sharing
it. So if a process has 1000 pages all to itself, and 1000 shared with one
other process, its PSS will be 1500.
- lwn.net: "ELC: How much memory are applications really using?"
The PSS proposed by Matt Mackall is a very nice metic for measuring an
process's memory footprint. So collect and export it via
/proc/<pid>/smaps.
Matt Mackall's pagemap/kpagemap and John Berthels's exmap can also do the
job. They are comprehensive tools. But for PSS, let's do it in the simple
way.
Cc: John Berthels <jjberthels@gmail.com>
Cc: Bernardo Innocenti <bernie@codewiz.org>
Cc: Padraig Brady <P@draigBrady.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Hugh Dickins <hugh@veritas.com>
---
fs/proc/task_mmu.c | 29 ++++++++++++++++++++++++++++-
1 files changed, 28 insertions(+), 1 deletion(-)
Index: l/fs/proc/task_mmu.c
===================================================================
--- l.orig/fs/proc/task_mmu.c 2007-10-16 21:13:47.000000000 -0500
+++ l/fs/proc/task_mmu.c 2007-10-22 15:53:12.000000000 -0500
@@ -114,6 +114,25 @@ static void pad_len_spaces(struct seq_fi
seq_printf(m, "%*c", len, ' ');
}
+/*
+ * Proportional Set Size(PSS): my share of RSS.
+ *
+ * PSS of a process is the count of pages it has in memory, where each
+ * page is divided by the number of processes sharing it. So if a
+ * process has 1000 pages all to itself, and 1000 shared with one other
+ * process, its PSS will be 1500.
+ *
+ * To keep (accumulated) division errors low, we adopt a 64bit
+ * fixed-point pss counter to minimize division errors. So (pss >>
+ * PSS_SHIFT) would be the real byte count.
+ *
+ * A shift of 12 before division means (assuming 4K page size):
+ * - 1M 3-user-pages add up to 8KB errors;
+ * - supports mapcount up to 2^24, or 16M;
+ * - supports PSS up to 2^52 bytes, or 4PB.
+ */
+#define PSS_SHIFT 12
+
struct mem_size_stats
{
unsigned long resident;
@@ -122,6 +141,7 @@ struct mem_size_stats
unsigned long private_clean;
unsigned long private_dirty;
unsigned long referenced;
+ u64 pss;
};
struct pmd_walker {
@@ -195,6 +215,7 @@ static int show_map_internal(struct seq_
seq_printf(m,
"Size: %8lu kB\n"
"Rss: %8lu kB\n"
+ "Pss: %8lu kB\n"
"Shared_Clean: %8lu kB\n"
"Shared_Dirty: %8lu kB\n"
"Private_Clean: %8lu kB\n"
@@ -202,6 +223,7 @@ static int show_map_internal(struct seq_
"Referenced: %8lu kB\n",
(vma->vm_end - vma->vm_start) >> 10,
mss->resident >> 10,
+ (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
mss->shared_clean >> 10,
mss->shared_dirty >> 10,
mss->private_clean >> 10,
@@ -226,6 +248,7 @@ static void smaps_pte_range(struct vm_ar
pte_t *pte, ptent;
spinlock_t *ptl;
struct page *page;
+ int mapcount;
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
@@ -242,16 +265,19 @@ static void smaps_pte_range(struct vm_ar
/* Accumulate the size in pages that have been accessed. */
if (pte_young(ptent) || PageReferenced(page))
mss->referenced += PAGE_SIZE;
- if (page_mapcount(page) >= 2) {
+ mapcount = page_mapcount(page);
+ if (mapcount >= 2) {
if (pte_dirty(ptent))
mss->shared_dirty += PAGE_SIZE;
else
mss->shared_clean += PAGE_SIZE;
+ mss->pss += (PAGE_SIZE << PSS_SHIFT) / mapcount;
} else {
if (pte_dirty(ptent))
mss->private_dirty += PAGE_SIZE;
else
mss->private_clean += PAGE_SIZE;
+ mss->pss += (PAGE_SIZE << PSS_SHIFT);
}
}
pte_unmap_unlock(pte - 1, ptl);
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 2/12] maps4: From: Dave Hansen <haveblue@us.ibm.com>
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
2007-10-26 16:36 ` [PATCH 1/12] maps4: add proportional set size accounting in smaps Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 21:22 ` David Rientjes
2007-10-26 16:36 ` [PATCH 3/12] maps4: move is_swap_pte Matt Mackall
` (9 subsequent siblings)
11 siblings, 1 reply; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Dave Hansen <haveblue@us.ibm.com>
The following replaces the earlier patches sent. It should address
David Rientjes's comments, and has been compile tested on all the
architectures that it touches, save for parisc.
----
For the /proc/<pid>/pagemap code[1], we need to able to query how
much virtual address space a particular task has. The trick is
that we do it through /proc and can't use TASK_SIZE since it
references "current" on some arches. The process opening the
/proc file might be a 32-bit process opening a 64-bit process's
pagemap file.
x86_64 already has a TASK_SIZE_OF() macro:
#define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_IA32)) ? IA32_PAGE_OFFSET : TASK_SIZE64)
I'd like to have that for other architectures. So, add it
for all the architectures that actually use "current" in
their TASK_SIZE. For the others, just add a quick #define
in sched.h to use plain old TASK_SIZE.
1. http://www.linuxworld.com/news/2007/042407-kernel.html
- MIPS portion from Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: David Rientjes <rientjes@google.com>
---
lxc-dave/include/asm-ia64/processor.h | 3 ++-
lxc-dave/include/asm-mips/processor.h | 4 ++++
lxc-dave/include/asm-parisc/processor.h | 3 ++-
lxc-dave/include/asm-powerpc/processor.h | 3 ++-
lxc-dave/include/asm-s390/processor.h | 3 ++-
lxc-dave/include/linux/sched.h | 4 ++++
6 files changed, 16 insertions(+), 4 deletions(-)
Index: l/include/asm-ia64/processor.h
===================================================================
--- l.orig/include/asm-ia64/processor.h 2007-10-16 21:13:47.000000000 -0500
+++ l/include/asm-ia64/processor.h 2007-10-22 15:53:27.000000000 -0500
@@ -31,7 +31,8 @@
* each (assuming 8KB page size), for a total of 8TB of user virtual
* address space.
*/
-#define TASK_SIZE (current->thread.task_size)
+#define TASK_SIZE_OF(tsk) ((tsk)->thread.task_size)
+#define TASK_SIZE TASK_SIZE_OF(current)
/*
* This decides where the kernel will search for a free chunk of vm
Index: l/include/asm-mips/processor.h
===================================================================
--- l.orig/include/asm-mips/processor.h 2007-10-16 21:13:47.000000000 -0500
+++ l/include/asm-mips/processor.h 2007-10-22 15:53:27.000000000 -0500
@@ -45,6 +45,8 @@ extern unsigned int vced_count, vcei_cou
* space during mmap's.
*/
#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
+#define TASK_SIZE_OF(tsk) \
+ (test_tsk_thread_flag(tsk, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE)
#endif
#ifdef CONFIG_64BIT
@@ -65,6 +67,8 @@ extern unsigned int vced_count, vcei_cou
#define TASK_UNMAPPED_BASE \
(test_thread_flag(TIF_32BIT_ADDR) ? \
PAGE_ALIGN(TASK_SIZE32 / 3) : PAGE_ALIGN(TASK_SIZE / 3))
+#define TASK_SIZE_OF(tsk) \
+ (test_tsk_thread_flag(tsk, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE)
#endif
#define NUM_FPU_REGS 32
Index: l/include/asm-parisc/processor.h
===================================================================
--- l.orig/include/asm-parisc/processor.h 2007-10-16 21:13:47.000000000 -0500
+++ l/include/asm-parisc/processor.h 2007-10-22 15:53:27.000000000 -0500
@@ -32,7 +32,8 @@
#endif
#define current_text_addr() ({ void *pc; current_ia(pc); pc; })
-#define TASK_SIZE (current->thread.task_size)
+#define TASK_SIZE_OF(tsk) ((tsk)->thread.task_size)
+#define TASK_SIZE TASK_SIZE_OF(current)
#define TASK_UNMAPPED_BASE (current->thread.map_base)
#define DEFAULT_TASK_SIZE32 (0xFFF00000UL)
Index: l/include/asm-powerpc/processor.h
===================================================================
--- l.orig/include/asm-powerpc/processor.h 2007-10-16 21:13:47.000000000 -0500
+++ l/include/asm-powerpc/processor.h 2007-10-22 15:53:27.000000000 -0500
@@ -99,8 +99,9 @@ extern struct task_struct *last_task_use
*/
#define TASK_SIZE_USER32 (0x0000000100000000UL - (1*PAGE_SIZE))
-#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
+#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
TASK_SIZE_USER32 : TASK_SIZE_USER64)
+#define TASK_SIZE TASK_SIZE_OF(current)
/* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
Index: l/include/asm-s390/processor.h
===================================================================
--- l.orig/include/asm-s390/processor.h 2007-10-16 21:13:47.000000000 -0500
+++ l/include/asm-s390/processor.h 2007-10-22 15:53:27.000000000 -0500
@@ -73,8 +73,9 @@ extern struct task_struct *last_task_use
#else /* __s390x__ */
-# define TASK_SIZE (test_thread_flag(TIF_31BIT) ? \
+# define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_31BIT) ? \
(0x80000000UL) : (0x40000000000UL))
+# define TASK_SIZE TASK_SIZE_OF(current)
# define TASK_UNMAPPED_BASE (TASK_SIZE / 2)
# define DEFAULT_TASK_SIZE (0x40000000000UL)
Index: l/include/linux/sched.h
===================================================================
--- l.orig/include/linux/sched.h 2007-10-16 21:13:47.000000000 -0500
+++ l/include/linux/sched.h 2007-10-22 15:53:27.000000000 -0500
@@ -1880,6 +1880,10 @@ static inline void inc_syscw(struct task
}
#endif
+#ifndef TASK_SIZE_OF
+#define TASK_SIZE_OF(tsk) TASK_SIZE
+#endif
+
#endif /* __KERNEL__ */
#endif
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 2/12] maps4: From: Dave Hansen <haveblue@us.ibm.com>
2007-10-26 16:36 ` [PATCH 2/12] maps4: From: Dave Hansen <haveblue@us.ibm.com> Matt Mackall
@ 2007-10-26 21:22 ` David Rientjes
0 siblings, 0 replies; 15+ messages in thread
From: David Rientjes @ 2007-10-26 21:22 UTC (permalink / raw)
To: Matt Mackall
Cc: Andrew Morton, linux-kernel, Dave Hansen, Rusty Russell,
Jeremy Fitzhardinge, Fengguang Wu
On Fri, 26 Oct 2007, Matt Mackall wrote:
> The following replaces the earlier patches sent. It should address
> David Rientjes's comments, and has been compile tested on all the
> architectures that it touches, save for parisc.
>
Looks good, thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/12] maps4: move is_swap_pte
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
2007-10-26 16:36 ` [PATCH 1/12] maps4: add proportional set size accounting in smaps Matt Mackall
2007-10-26 16:36 ` [PATCH 2/12] maps4: From: Dave Hansen <haveblue@us.ibm.com> Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 4/12] maps4: introduce a generic page walker Matt Mackall
` (8 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
Move is_swap_pte helper function to swapops.h for use by pagemap code
Signed-off-by: Matt Mackall <mpm@selenic.com>
Index: l/include/linux/swapops.h
===================================================================
--- l.orig/include/linux/swapops.h 2007-10-09 17:36:25.000000000 -0500
+++ l/include/linux/swapops.h 2007-10-10 11:46:34.000000000 -0500
@@ -42,6 +42,12 @@ static inline pgoff_t swp_offset(swp_ent
return entry.val & SWP_OFFSET_MASK(entry);
}
+/* check whether a pte points to a swap entry */
+static inline int is_swap_pte(pte_t pte)
+{
+ return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
+}
+
/*
* Convert the arch-dependent pte representation of a swp_entry_t into an
* arch-independent swp_entry_t.
Index: l/mm/migrate.c
===================================================================
--- l.orig/mm/migrate.c 2007-10-09 17:37:59.000000000 -0500
+++ l/mm/migrate.c 2007-10-10 11:46:34.000000000 -0500
@@ -114,11 +114,6 @@ int putback_lru_pages(struct list_head *
return count;
}
-static inline int is_swap_pte(pte_t pte)
-{
- return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
-}
-
/*
* Restore a potential migration pte to a working pte entry
*/
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 4/12] maps4: introduce a generic page walker
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (2 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 3/12] maps4: move is_swap_pte Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 5/12] maps4: use pagewalker in clear_refs and smaps Matt Mackall
` (7 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
Introduce a general page table walker
Signed-off-by: Matt Mackall <mpm@selenic.com>
Index: l/include/linux/mm.h
===================================================================
--- l.orig/include/linux/mm.h 2007-10-26 11:19:36.000000000 -0500
+++ l/include/linux/mm.h 2007-10-26 11:20:37.000000000 -0500
@@ -773,6 +773,28 @@ unsigned long unmap_vmas(struct mmu_gath
struct vm_area_struct *start_vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *);
+
+/**
+ * mm_walk - callbacks for walk_page_range
+ * @pgd_entry: if set, called for each non-empty PGD (top-level) entry
+ * @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
+ * @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
+ * @pte_entry: if set, called for each non-empty PTE (4th-level) entry
+ * @pte_hole: if set, called for each hole at all levels
+ *
+ * (see walk_page_range for more details)
+ */
+struct mm_walk {
+ int (*pgd_entry)(pgd_t *, unsigned long, unsigned long, void *);
+ int (*pud_entry)(pud_t *, unsigned long, unsigned long, void *);
+ int (*pmd_entry)(pmd_t *, unsigned long, unsigned long, void *);
+ int (*pte_entry)(pte_t *, unsigned long, unsigned long, void *);
+ int (*pte_hole)(unsigned long, unsigned long, void *);
+};
+
+int walk_page_range(const struct mm_struct *, unsigned long addr,
+ unsigned long end, const struct mm_walk *walk,
+ void *private);
void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
unsigned long end, unsigned long floor, unsigned long ceiling);
void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
Index: l/mm/Makefile
===================================================================
--- l.orig/mm/Makefile 2007-10-26 11:19:36.000000000 -0500
+++ l/mm/Makefile 2007-10-26 11:20:01.000000000 -0500
@@ -5,7 +5,7 @@
mmu-y := nommu.o
mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \
mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
- vmalloc.o
+ vmalloc.o pagewalk.o
obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
page_alloc.o page-writeback.o pdflush.o \
Index: l/mm/pagewalk.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ l/mm/pagewalk.c 2007-10-26 11:20:01.000000000 -0500
@@ -0,0 +1,131 @@
+#include <linux/mm.h>
+#include <linux/highmem.h>
+#include <linux/sched.h>
+
+static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+ const struct mm_walk *walk, void *private)
+{
+ pte_t *pte;
+ int err = 0;
+
+ pte = pte_offset_map(pmd, addr);
+ do {
+ err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, private);
+ if (err)
+ break;
+ } while (pte++, addr += PAGE_SIZE, addr != end);
+
+ pte_unmap(pte);
+ return err;
+}
+
+static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
+ const struct mm_walk *walk, void *private)
+{
+ pmd_t *pmd;
+ unsigned long next;
+ int err = 0;
+
+ pmd = pmd_offset(pud, addr);
+ do {
+ next = pmd_addr_end(addr, end);
+ if (pmd_none_or_clear_bad(pmd)) {
+ if (walk->pte_hole)
+ err = walk->pte_hole(addr, next, private);
+ if (err)
+ break;
+ continue;
+ }
+ if (walk->pmd_entry)
+ err = walk->pmd_entry(pmd, addr, next, private);
+ if (!err && walk->pte_entry)
+ err = walk_pte_range(pmd, addr, next, walk, private);
+ if (err)
+ break;
+ } while (pmd++, addr = next, addr != end);
+
+ return err;
+}
+
+static int walk_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end,
+ const struct mm_walk *walk, void *private)
+{
+ pud_t *pud;
+ unsigned long next;
+ int err = 0;
+
+ pud = pud_offset(pgd, addr);
+ do {
+ next = pud_addr_end(addr, end);
+ if (pud_none_or_clear_bad(pud)) {
+ if (walk->pte_hole)
+ err = walk->pte_hole(addr, next, private);
+ if (err)
+ break;
+ continue;
+ }
+ if (walk->pud_entry)
+ err = walk->pud_entry(pud, addr, next, private);
+ if (!err && (walk->pmd_entry || walk->pte_entry))
+ err = walk_pmd_range(pud, addr, next, walk, private);
+ if (err)
+ break;
+ } while (pud++, addr = next, addr != end);
+
+ return err;
+}
+
+/**
+ * walk_page_range - walk a memory map's page tables with a callback
+ * @mm - memory map to walk
+ * @addr - starting address
+ * @end - ending address
+ * @walk - set of callbacks to invoke for each level of the tree
+ * @private - private data passed to the callback function
+ *
+ * Recursively walk the page table for the memory area in a VMA,
+ * calling supplied callbacks. Callbacks are called in-order (first
+ * PGD, first PUD, first PMD, first PTE, second PTE... second PMD,
+ * etc.). If lower-level callbacks are omitted, walking depth is reduced.
+ *
+ * Each callback receives an entry pointer, the start and end of the
+ * associated range, and a caller-supplied private data pointer.
+ *
+ * No locks are taken, but the bottom level iterator will map PTE
+ * directories from highmem if necessary.
+ *
+ * If any callback returns a non-zero value, the walk is aborted and
+ * the return value is propagated back to the caller. Otherwise 0 is returned.
+ */
+int walk_page_range(const struct mm_struct *mm,
+ unsigned long addr, unsigned long end,
+ const struct mm_walk *walk, void *private)
+{
+ pgd_t *pgd;
+ unsigned long next;
+ int err = 0;
+
+ if (addr >= end)
+ return err;
+
+ pgd = pgd_offset(mm, addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ if (pgd_none_or_clear_bad(pgd)) {
+ if (walk->pte_hole)
+ err = walk->pte_hole(addr, next, private);
+ if (err)
+ break;
+ continue;
+ }
+ if (walk->pgd_entry)
+ err = walk->pgd_entry(pgd, addr, next, private);
+ if (!err &&
+ (walk->pud_entry || walk->pmd_entry || walk->pte_entry))
+ err = walk_pud_range(pgd, addr, next, walk, private);
+ if (err)
+ break;
+ } while (pgd++, addr = next, addr != end);
+
+ return err;
+}
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 5/12] maps4: use pagewalker in clear_refs and smaps
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (3 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 4/12] maps4: introduce a generic page walker Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 6/12] maps4: simplify interdependence of maps " Matt Mackall
` (6 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
Use the generic pagewalker for smaps and clear_refs
Signed-off-by: Matt Mackall <mpm@selenic.com>
Index: l/fs/proc/task_mmu.c
===================================================================
--- l.orig/fs/proc/task_mmu.c 2007-10-22 16:24:47.000000000 -0500
+++ l/fs/proc/task_mmu.c 2007-10-22 17:51:18.000000000 -0500
@@ -135,6 +135,7 @@ static void pad_len_spaces(struct seq_fi
struct mem_size_stats
{
+ struct vm_area_struct *vma;
unsigned long resident;
unsigned long shared_clean;
unsigned long shared_dirty;
@@ -144,13 +145,6 @@ struct mem_size_stats
u64 pss;
};
-struct pmd_walker {
- struct vm_area_struct *vma;
- void *private;
- void (*action)(struct vm_area_struct *, pmd_t *, unsigned long,
- unsigned long, void *);
-};
-
static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats *mss)
{
struct proc_maps_private *priv = m->private;
@@ -240,11 +234,11 @@ static int show_map(struct seq_file *m,
return show_map_internal(m, v, NULL);
}
-static void smaps_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, unsigned long end,
- void *private)
+static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+ void *private)
{
struct mem_size_stats *mss = private;
+ struct vm_area_struct *vma = mss->vma;
pte_t *pte, ptent;
spinlock_t *ptl;
struct page *page;
@@ -282,12 +276,13 @@ static void smaps_pte_range(struct vm_ar
}
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
+ return 0;
}
-static void clear_refs_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, unsigned long end,
- void *private)
+static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end, void *private)
{
+ struct vm_area_struct *vma = private;
pte_t *pte, ptent;
spinlock_t *ptl;
struct page *page;
@@ -308,71 +303,10 @@ static void clear_refs_pte_range(struct
}
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
+ return 0;
}
-static inline void walk_pmd_range(struct pmd_walker *walker, pud_t *pud,
- unsigned long addr, unsigned long end)
-{
- pmd_t *pmd;
- unsigned long next;
-
- for (pmd = pmd_offset(pud, addr); addr != end;
- pmd++, addr = next) {
- next = pmd_addr_end(addr, end);
- if (pmd_none_or_clear_bad(pmd))
- continue;
- walker->action(walker->vma, pmd, addr, next, walker->private);
- }
-}
-
-static inline void walk_pud_range(struct pmd_walker *walker, pgd_t *pgd,
- unsigned long addr, unsigned long end)
-{
- pud_t *pud;
- unsigned long next;
-
- for (pud = pud_offset(pgd, addr); addr != end;
- pud++, addr = next) {
- next = pud_addr_end(addr, end);
- if (pud_none_or_clear_bad(pud))
- continue;
- walk_pmd_range(walker, pud, addr, next);
- }
-}
-
-/*
- * walk_page_range - walk the page tables of a VMA with a callback
- * @vma - VMA to walk
- * @action - callback invoked for every bottom-level (PTE) page table
- * @private - private data passed to the callback function
- *
- * Recursively walk the page table for the memory area in a VMA, calling
- * a callback for every bottom-level (PTE) page table.
- */
-static inline void walk_page_range(struct vm_area_struct *vma,
- void (*action)(struct vm_area_struct *,
- pmd_t *, unsigned long,
- unsigned long, void *),
- void *private)
-{
- unsigned long addr = vma->vm_start;
- unsigned long end = vma->vm_end;
- struct pmd_walker walker = {
- .vma = vma,
- .private = private,
- .action = action,
- };
- pgd_t *pgd;
- unsigned long next;
-
- for (pgd = pgd_offset(vma->vm_mm, addr); addr != end;
- pgd++, addr = next) {
- next = pgd_addr_end(addr, end);
- if (pgd_none_or_clear_bad(pgd))
- continue;
- walk_pud_range(&walker, pgd, addr, next);
- }
-}
+static struct mm_walk smaps_walk = { .pmd_entry = smaps_pte_range };
static int show_smap(struct seq_file *m, void *v)
{
@@ -380,11 +314,15 @@ static int show_smap(struct seq_file *m,
struct mem_size_stats mss;
memset(&mss, 0, sizeof mss);
+ mss.vma = vma;
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
- walk_page_range(vma, smaps_pte_range, &mss);
+ walk_page_range(vma->vm_mm, vma->vm_start, vma->vm_end,
+ &smaps_walk, &mss);
return show_map_internal(m, v, &mss);
}
+static struct mm_walk clear_refs_walk = { .pmd_entry = clear_refs_pte_range };
+
void clear_refs_smap(struct mm_struct *mm)
{
struct vm_area_struct *vma;
@@ -392,7 +330,8 @@ void clear_refs_smap(struct mm_struct *m
down_read(&mm->mmap_sem);
for (vma = mm->mmap; vma; vma = vma->vm_next)
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
- walk_page_range(vma, clear_refs_pte_range, NULL);
+ walk_page_range(vma->vm_mm, vma->vm_start, vma->vm_end,
+ &clear_refs_walk, vma);
flush_tlb_mm(mm);
up_read(&mm->mmap_sem);
}
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 6/12] maps4: simplify interdependence of maps and smaps
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (4 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 5/12] maps4: use pagewalker in clear_refs and smaps Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 21:22 ` David Rientjes
2007-10-26 16:36 ` [PATCH 7/12] maps4: move clear_refs code to task_mmu.c Matt Mackall
` (5 subsequent siblings)
11 siblings, 1 reply; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Matt Mackall <mpm@selenic.com>
This pulls the shared map display code out of show_map and puts it in
show_smap where it belongs.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Index: l/fs/proc/task_mmu.c
===================================================================
--- l.orig/fs/proc/task_mmu.c 2007-10-22 15:59:29.000000000 -0500
+++ l/fs/proc/task_mmu.c 2007-10-22 16:00:49.000000000 -0500
@@ -145,7 +145,7 @@ struct mem_size_stats
u64 pss;
};
-static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats *mss)
+static int show_map(struct seq_file *m, void *v)
{
struct proc_maps_private *priv = m->private;
struct task_struct *task = priv->task;
@@ -205,35 +205,11 @@ static int show_map_internal(struct seq_
}
seq_putc(m, '\n');
- if (mss)
- seq_printf(m,
- "Size: %8lu kB\n"
- "Rss: %8lu kB\n"
- "Pss: %8lu kB\n"
- "Shared_Clean: %8lu kB\n"
- "Shared_Dirty: %8lu kB\n"
- "Private_Clean: %8lu kB\n"
- "Private_Dirty: %8lu kB\n"
- "Referenced: %8lu kB\n",
- (vma->vm_end - vma->vm_start) >> 10,
- mss->resident >> 10,
- (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
- mss->shared_clean >> 10,
- mss->shared_dirty >> 10,
- mss->private_clean >> 10,
- mss->private_dirty >> 10,
- mss->referenced >> 10);
-
if (m->count < m->size) /* vma is copied successfully */
m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;
return 0;
}
-static int show_map(struct seq_file *m, void *v)
-{
- return show_map_internal(m, v, NULL);
-}
-
static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
void *private)
{
@@ -312,13 +288,37 @@ static int show_smap(struct seq_file *m,
{
struct vm_area_struct *vma = v;
struct mem_size_stats mss;
+ int ret;
memset(&mss, 0, sizeof mss);
mss.vma = vma;
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
walk_page_range(vma->vm_mm, vma->vm_start, vma->vm_end,
&smaps_walk, &mss);
- return show_map_internal(m, v, &mss);
+
+ ret = show_map(m, v);
+ if (ret)
+ return ret;
+
+ seq_printf(m,
+ "Size: %8lu kB\n"
+ "Rss: %8lu kB\n"
+ "Pss: %8lu kB\n"
+ "Shared_Clean: %8lu kB\n"
+ "Shared_Dirty: %8lu kB\n"
+ "Private_Clean: %8lu kB\n"
+ "Private_Dirty: %8lu kB\n"
+ "Referenced: %8lu kB\n",
+ (vma->vm_end - vma->vm_start) >> 10,
+ mss.resident >> 10,
+ (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
+ mss.shared_clean >> 10,
+ mss.shared_dirty >> 10,
+ mss.private_clean >> 10,
+ mss.private_dirty >> 10,
+ mss.referenced >> 10);
+
+ return ret;
}
static struct mm_walk clear_refs_walk = { .pmd_entry = clear_refs_pte_range };
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 7/12] maps4: move clear_refs code to task_mmu.c
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (5 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 6/12] maps4: simplify interdependence of maps " Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 8/12] maps4: regroup task_mmu by interface Matt Mackall
` (4 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Matt Mackall <mpm@selenic.com>
This puts all the clear_refs code where it belongs and probably lets things
compile on MMU-less systems as well.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Index: l/fs/proc/base.c
===================================================================
--- l.orig/fs/proc/base.c 2007-10-16 21:13:45.000000000 -0500
+++ l/fs/proc/base.c 2007-10-22 16:23:05.000000000 -0500
@@ -85,10 +85,6 @@
* in /proc for a task before it execs a suid executable.
*/
-
-/* Worst case buffer size needed for holding an integer. */
-#define PROC_NUMBUF 13
-
struct pid_entry {
char *name;
int len;
@@ -713,42 +709,6 @@ static const struct file_operations proc
.write = oom_adjust_write,
};
-#ifdef CONFIG_MMU
-static ssize_t clear_refs_write(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
-{
- struct task_struct *task;
- char buffer[PROC_NUMBUF], *end;
- struct mm_struct *mm;
-
- memset(buffer, 0, sizeof(buffer));
- if (count > sizeof(buffer) - 1)
- count = sizeof(buffer) - 1;
- if (copy_from_user(buffer, buf, count))
- return -EFAULT;
- if (!simple_strtol(buffer, &end, 0))
- return -EINVAL;
- if (*end == '\n')
- end++;
- task = get_proc_task(file->f_path.dentry->d_inode);
- if (!task)
- return -ESRCH;
- mm = get_task_mm(task);
- if (mm) {
- clear_refs_smap(mm);
- mmput(mm);
- }
- put_task_struct(task);
- if (end - buffer == 0)
- return -EIO;
- return end - buffer;
-}
-
-static struct file_operations proc_clear_refs_operations = {
- .write = clear_refs_write,
-};
-#endif
-
#ifdef CONFIG_AUDITSYSCALL
#define TMPBUFLEN 21
static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
Index: l/fs/proc/internal.h
===================================================================
--- l.orig/fs/proc/internal.h 2007-10-16 21:13:45.000000000 -0500
+++ l/fs/proc/internal.h 2007-10-22 16:19:32.000000000 -0500
@@ -49,11 +49,7 @@ extern int proc_pid_statm(struct task_st
extern const struct file_operations proc_maps_operations;
extern const struct file_operations proc_numa_maps_operations;
extern const struct file_operations proc_smaps_operations;
-
-extern const struct file_operations proc_maps_operations;
-extern const struct file_operations proc_numa_maps_operations;
-extern const struct file_operations proc_smaps_operations;
-
+extern const struct file_operations proc_clear_refs_operations;
void free_proc_entry(struct proc_dir_entry *de);
Index: l/fs/proc/task_mmu.c
===================================================================
--- l.orig/fs/proc/task_mmu.c 2007-10-22 16:00:49.000000000 -0500
+++ l/fs/proc/task_mmu.c 2007-10-22 16:22:54.000000000 -0500
@@ -323,19 +323,47 @@ static int show_smap(struct seq_file *m,
static struct mm_walk clear_refs_walk = { .pmd_entry = clear_refs_pte_range };
-void clear_refs_smap(struct mm_struct *mm)
+static ssize_t clear_refs_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
{
+ struct task_struct *task;
+ char buffer[PROC_NUMBUF], *end;
+ struct mm_struct *mm;
struct vm_area_struct *vma;
- down_read(&mm->mmap_sem);
- for (vma = mm->mmap; vma; vma = vma->vm_next)
- if (vma->vm_mm && !is_vm_hugetlb_page(vma))
- walk_page_range(vma->vm_mm, vma->vm_start, vma->vm_end,
- &clear_refs_walk, vma);
- flush_tlb_mm(mm);
- up_read(&mm->mmap_sem);
+ memset(buffer, 0, sizeof(buffer));
+ if (count > sizeof(buffer) - 1)
+ count = sizeof(buffer) - 1;
+ if (copy_from_user(buffer, buf, count))
+ return -EFAULT;
+ if (!simple_strtol(buffer, &end, 0))
+ return -EINVAL;
+ if (*end == '\n')
+ end++;
+ task = get_proc_task(file->f_path.dentry->d_inode);
+ if (!task)
+ return -ESRCH;
+ mm = get_task_mm(task);
+ if (mm) {
+ down_read(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next)
+ if (!is_vm_hugetlb_page(vma))
+ walk_page_range(mm, vma->vm_start, vma->vm_end,
+ &clear_refs_walk, vma);
+ flush_tlb_mm(mm);
+ up_read(&mm->mmap_sem);
+ mmput(mm);
+ }
+ put_task_struct(task);
+ if (end - buffer == 0)
+ return -EIO;
+ return end - buffer;
}
+const struct file_operations proc_clear_refs_operations = {
+ .write = clear_refs_write,
+};
+
static void *m_start(struct seq_file *m, loff_t *pos)
{
struct proc_maps_private *priv = m->private;
Index: l/include/linux/proc_fs.h
===================================================================
--- l.orig/include/linux/proc_fs.h 2007-10-16 21:13:45.000000000 -0500
+++ l/include/linux/proc_fs.h 2007-10-22 16:22:43.000000000 -0500
@@ -18,6 +18,8 @@ struct completion;
*/
#define FIRST_PROCESS_ENTRY 256
+/* Worst case buffer size needed for holding an integer. */
+#define PROC_NUMBUF 13
/*
* We always define these enumerators
@@ -116,7 +118,6 @@ int proc_pid_readdir(struct file * filp,
unsigned long task_vsize(struct mm_struct *);
int task_statm(struct mm_struct *, int *, int *, int *, int *);
char *task_mem(struct mm_struct *, char *);
-void clear_refs_smap(struct mm_struct *mm);
struct proc_dir_entry *de_get(struct proc_dir_entry *de);
void de_put(struct proc_dir_entry *de);
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 8/12] maps4: regroup task_mmu by interface
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (6 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 7/12] maps4: move clear_refs code to task_mmu.c Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 9/12] maps4: add /proc/pid/pagemap interface Matt Mackall
` (3 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Matt Mackall <mpm@selenic.com>
Reorder source so that all the code and data for each interface is together.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Index: l/fs/proc/task_mmu.c
===================================================================
--- l.orig/fs/proc/task_mmu.c 2007-10-26 11:20:54.000000000 -0500
+++ l/fs/proc/task_mmu.c 2007-10-26 11:29:24.000000000 -0500
@@ -114,36 +114,123 @@ static void pad_len_spaces(struct seq_fi
seq_printf(m, "%*c", len, ' ');
}
-/*
- * Proportional Set Size(PSS): my share of RSS.
- *
- * PSS of a process is the count of pages it has in memory, where each
- * page is divided by the number of processes sharing it. So if a
- * process has 1000 pages all to itself, and 1000 shared with one other
- * process, its PSS will be 1500.
- *
- * To keep (accumulated) division errors low, we adopt a 64bit
- * fixed-point pss counter to minimize division errors. So (pss >>
- * PSS_SHIFT) would be the real byte count.
- *
- * A shift of 12 before division means (assuming 4K page size):
- * - 1M 3-user-pages add up to 8KB errors;
- * - supports mapcount up to 2^24, or 16M;
- * - supports PSS up to 2^52 bytes, or 4PB.
- */
-#define PSS_SHIFT 12
+static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma)
+{
+ if (vma && vma != priv->tail_vma) {
+ struct mm_struct *mm = vma->vm_mm;
+ up_read(&mm->mmap_sem);
+ mmput(mm);
+ }
+}
-struct mem_size_stats
+static void *m_start(struct seq_file *m, loff_t *pos)
{
- struct vm_area_struct *vma;
- unsigned long resident;
- unsigned long shared_clean;
- unsigned long shared_dirty;
- unsigned long private_clean;
- unsigned long private_dirty;
- unsigned long referenced;
- u64 pss;
-};
+ struct proc_maps_private *priv = m->private;
+ unsigned long last_addr = m->version;
+ struct mm_struct *mm;
+ struct vm_area_struct *vma, *tail_vma = NULL;
+ loff_t l = *pos;
+
+ /* Clear the per syscall fields in priv */
+ priv->task = NULL;
+ priv->tail_vma = NULL;
+
+ /*
+ * We remember last_addr rather than next_addr to hit with
+ * mmap_cache most of the time. We have zero last_addr at
+ * the beginning and also after lseek. We will have -1 last_addr
+ * after the end of the vmas.
+ */
+
+ if (last_addr == -1UL)
+ return NULL;
+
+ priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
+ if (!priv->task)
+ return NULL;
+
+ mm = get_task_mm(priv->task);
+ if (!mm)
+ return NULL;
+
+ tail_vma = get_gate_vma(priv->task);
+ priv->tail_vma = tail_vma;
+ down_read(&mm->mmap_sem);
+
+ /* Start with last addr hint */
+ vma = find_vma(mm, last_addr);
+ if (last_addr && vma) {
+ vma = vma->vm_next;
+ goto out;
+ }
+
+ /*
+ * Check the vma index is within the range and do
+ * sequential scan until m_index.
+ */
+ vma = NULL;
+ if ((unsigned long)l < mm->map_count) {
+ vma = mm->mmap;
+ while (l-- && vma)
+ vma = vma->vm_next;
+ goto out;
+ }
+
+ if (l != mm->map_count)
+ tail_vma = NULL; /* After gate vma */
+
+out:
+ if (vma)
+ return vma;
+
+ /* End of vmas has been reached */
+ m->version = (tail_vma != NULL)? 0: -1UL;
+ up_read(&mm->mmap_sem);
+ mmput(mm);
+ return tail_vma;
+}
+
+static void *m_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct proc_maps_private *priv = m->private;
+ struct vm_area_struct *vma = v;
+ struct vm_area_struct *tail_vma = priv->tail_vma;
+
+ (*pos)++;
+ if (vma && (vma != tail_vma) && vma->vm_next)
+ return vma->vm_next;
+ vma_stop(priv, vma);
+ return (vma != tail_vma)? tail_vma: NULL;
+}
+
+static void m_stop(struct seq_file *m, void *v)
+{
+ struct proc_maps_private *priv = m->private;
+ struct vm_area_struct *vma = v;
+
+ vma_stop(priv, vma);
+ if (priv->task)
+ put_task_struct(priv->task);
+}
+
+static int do_maps_open(struct inode *inode, struct file *file,
+ struct seq_operations *ops)
+{
+ struct proc_maps_private *priv;
+ int ret = -ENOMEM;
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (priv) {
+ priv->pid = proc_pid(inode);
+ ret = seq_open(file, ops);
+ if (!ret) {
+ struct seq_file *m = file->private_data;
+ m->private = priv;
+ } else {
+ kfree(priv);
+ }
+ }
+ return ret;
+}
static int show_map(struct seq_file *m, void *v)
{
@@ -210,6 +297,56 @@ static int show_map(struct seq_file *m,
return 0;
}
+static struct seq_operations proc_pid_maps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_map
+};
+
+static int maps_open(struct inode *inode, struct file *file)
+{
+ return do_maps_open(inode, file, &proc_pid_maps_op);
+}
+
+const struct file_operations proc_maps_operations = {
+ .open = maps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+/*
+ * Proportional Set Size(PSS): my share of RSS.
+ *
+ * PSS of a process is the count of pages it has in memory, where each
+ * page is divided by the number of processes sharing it. So if a
+ * process has 1000 pages all to itself, and 1000 shared with one other
+ * process, its PSS will be 1500.
+ *
+ * To keep (accumulated) division errors low, we adopt a 64bit
+ * fixed-point pss counter to minimize division errors. So (pss >>
+ * PSS_SHIFT) would be the real byte count.
+ *
+ * A shift of 12 before division means (assuming 4K page size):
+ * - 1M 3-user-pages add up to 8KB errors;
+ * - supports mapcount up to 2^24, or 16M;
+ * - supports PSS up to 2^52 bytes, or 4PB.
+ */
+#define PSS_SHIFT 12
+
+struct mem_size_stats
+{
+ struct vm_area_struct *vma;
+ unsigned long resident;
+ unsigned long shared_clean;
+ unsigned long shared_dirty;
+ unsigned long private_clean;
+ unsigned long private_dirty;
+ unsigned long referenced;
+ u64 pss;
+};
+
static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
void *private)
{
@@ -255,33 +392,6 @@ static int smaps_pte_range(pmd_t *pmd, u
return 0;
}
-static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
- unsigned long end, void *private)
-{
- struct vm_area_struct *vma = private;
- pte_t *pte, ptent;
- spinlock_t *ptl;
- struct page *page;
-
- pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
- ptent = *pte;
- if (!pte_present(ptent))
- continue;
-
- page = vm_normal_page(vma, addr, ptent);
- if (!page)
- continue;
-
- /* Clear accessed and referenced bits. */
- ptep_test_and_clear_young(vma, addr, pte);
- ClearPageReferenced(page);
- }
- pte_unmap_unlock(pte - 1, ptl);
- cond_resched();
- return 0;
-}
-
static struct mm_walk smaps_walk = { .pmd_entry = smaps_pte_range };
static int show_smap(struct seq_file *m, void *v)
@@ -321,6 +431,52 @@ static int show_smap(struct seq_file *m,
return ret;
}
+static struct seq_operations proc_pid_smaps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_smap
+};
+
+static int smaps_open(struct inode *inode, struct file *file)
+{
+ return do_maps_open(inode, file, &proc_pid_smaps_op);
+}
+
+const struct file_operations proc_smaps_operations = {
+ .open = smaps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end, void *private)
+{
+ struct vm_area_struct *vma = private;
+ pte_t *pte, ptent;
+ spinlock_t *ptl;
+ struct page *page;
+
+ pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+ for (; addr != end; pte++, addr += PAGE_SIZE) {
+ ptent = *pte;
+ if (!pte_present(ptent))
+ continue;
+
+ page = vm_normal_page(vma, addr, ptent);
+ if (!page)
+ continue;
+
+ /* Clear accessed and referenced bits. */
+ ptep_test_and_clear_young(vma, addr, pte);
+ ClearPageReferenced(page);
+ }
+ pte_unmap_unlock(pte - 1, ptl);
+ cond_resched();
+ return 0;
+}
+
static struct mm_walk clear_refs_walk = { .pmd_entry = clear_refs_pte_range };
static ssize_t clear_refs_write(struct file *file, const char __user *buf,
@@ -364,148 +520,6 @@ const struct file_operations proc_clear_
.write = clear_refs_write,
};
-static void *m_start(struct seq_file *m, loff_t *pos)
-{
- struct proc_maps_private *priv = m->private;
- unsigned long last_addr = m->version;
- struct mm_struct *mm;
- struct vm_area_struct *vma, *tail_vma = NULL;
- loff_t l = *pos;
-
- /* Clear the per syscall fields in priv */
- priv->task = NULL;
- priv->tail_vma = NULL;
-
- /*
- * We remember last_addr rather than next_addr to hit with
- * mmap_cache most of the time. We have zero last_addr at
- * the beginning and also after lseek. We will have -1 last_addr
- * after the end of the vmas.
- */
-
- if (last_addr == -1UL)
- return NULL;
-
- priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
- if (!priv->task)
- return NULL;
-
- mm = get_task_mm(priv->task);
- if (!mm)
- return NULL;
-
- priv->tail_vma = tail_vma = get_gate_vma(priv->task);
- down_read(&mm->mmap_sem);
-
- /* Start with last addr hint */
- if (last_addr && (vma = find_vma(mm, last_addr))) {
- vma = vma->vm_next;
- goto out;
- }
-
- /*
- * Check the vma index is within the range and do
- * sequential scan until m_index.
- */
- vma = NULL;
- if ((unsigned long)l < mm->map_count) {
- vma = mm->mmap;
- while (l-- && vma)
- vma = vma->vm_next;
- goto out;
- }
-
- if (l != mm->map_count)
- tail_vma = NULL; /* After gate vma */
-
-out:
- if (vma)
- return vma;
-
- /* End of vmas has been reached */
- m->version = (tail_vma != NULL)? 0: -1UL;
- up_read(&mm->mmap_sem);
- mmput(mm);
- return tail_vma;
-}
-
-static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma)
-{
- if (vma && vma != priv->tail_vma) {
- struct mm_struct *mm = vma->vm_mm;
- up_read(&mm->mmap_sem);
- mmput(mm);
- }
-}
-
-static void *m_next(struct seq_file *m, void *v, loff_t *pos)
-{
- struct proc_maps_private *priv = m->private;
- struct vm_area_struct *vma = v;
- struct vm_area_struct *tail_vma = priv->tail_vma;
-
- (*pos)++;
- if (vma && (vma != tail_vma) && vma->vm_next)
- return vma->vm_next;
- vma_stop(priv, vma);
- return (vma != tail_vma)? tail_vma: NULL;
-}
-
-static void m_stop(struct seq_file *m, void *v)
-{
- struct proc_maps_private *priv = m->private;
- struct vm_area_struct *vma = v;
-
- vma_stop(priv, vma);
- if (priv->task)
- put_task_struct(priv->task);
-}
-
-static struct seq_operations proc_pid_maps_op = {
- .start = m_start,
- .next = m_next,
- .stop = m_stop,
- .show = show_map
-};
-
-static struct seq_operations proc_pid_smaps_op = {
- .start = m_start,
- .next = m_next,
- .stop = m_stop,
- .show = show_smap
-};
-
-static int do_maps_open(struct inode *inode, struct file *file,
- struct seq_operations *ops)
-{
- struct proc_maps_private *priv;
- int ret = -ENOMEM;
- priv = kzalloc(sizeof(*priv), GFP_KERNEL);
- if (priv) {
- priv->pid = proc_pid(inode);
- ret = seq_open(file, ops);
- if (!ret) {
- struct seq_file *m = file->private_data;
- m->private = priv;
- } else {
- kfree(priv);
- }
- }
- return ret;
-}
-
-static int maps_open(struct inode *inode, struct file *file)
-{
- return do_maps_open(inode, file, &proc_pid_maps_op);
-}
-
-const struct file_operations proc_maps_operations = {
- .open = maps_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = seq_release_private,
-};
-
#ifdef CONFIG_NUMA
extern int show_numa_map(struct seq_file *m, void *v);
@@ -540,14 +554,3 @@ const struct file_operations proc_numa_m
};
#endif
-static int smaps_open(struct inode *inode, struct file *file)
-{
- return do_maps_open(inode, file, &proc_pid_smaps_op);
-}
-
-const struct file_operations proc_smaps_operations = {
- .open = smaps_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = seq_release_private,
-};
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 9/12] maps4: add /proc/pid/pagemap interface
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (7 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 8/12] maps4: regroup task_mmu by interface Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 10/12] maps4: add /proc/kpagecount interface Matt Mackall
` (2 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Matt Mackall <mpm@selenic.com>
This interface provides a mapping for each page in an address space to its
physical page frame number, allowing precise determination of what pages are
mapped and what pages are shared between processes.
New in this version:
- headers gone again (as recommended by Dave Hansen and Alan Cox)
- 64-bit entries (as per discussion with Andi Kleen)
- swap pte information exported (from Dave Hansen)
- page walker callback for holes (from Dave Hansen)
- direct put_user I/O (as suggested by Rusty Russell)
This patch folds in cleanups and swap PTE support from Dave Hansen
<haveblue@us.ibm.com>.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Index: l/fs/proc/base.c
===================================================================
--- l.orig/fs/proc/base.c 2007-10-26 11:29:16.000000000 -0500
+++ l/fs/proc/base.c 2007-10-26 11:33:16.000000000 -0500
@@ -631,7 +631,7 @@ out_no_task:
}
#endif
-static loff_t mem_lseek(struct file * file, loff_t offset, int orig)
+loff_t mem_lseek(struct file *file, loff_t offset, int orig)
{
switch (orig) {
case 0:
@@ -2030,6 +2030,7 @@ static const struct pid_entry tgid_base_
#ifdef CONFIG_MMU
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps", S_IRUGO, smaps),
+ REG("pagemap", S_IRUSR, pagemap),
#endif
#ifdef CONFIG_SECURITY
DIR("attr", S_IRUGO|S_IXUGO, attr_dir),
@@ -2316,6 +2317,7 @@ static const struct pid_entry tid_base_s
#ifdef CONFIG_MMU
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps", S_IRUGO, smaps),
+ REG("pagemap", S_IRUSR, pagemap),
#endif
#ifdef CONFIG_SECURITY
DIR("attr", S_IRUGO|S_IXUGO, attr_dir),
Index: l/fs/proc/internal.h
===================================================================
--- l.orig/fs/proc/internal.h 2007-10-26 11:29:15.000000000 -0500
+++ l/fs/proc/internal.h 2007-10-26 11:33:30.000000000 -0500
@@ -45,11 +45,13 @@ extern int proc_tid_stat(struct task_str
extern int proc_tgid_stat(struct task_struct *, char *);
extern int proc_pid_status(struct task_struct *, char *);
extern int proc_pid_statm(struct task_struct *, char *);
+extern loff_t mem_lseek(struct file *file, loff_t offset, int orig);
extern const struct file_operations proc_maps_operations;
extern const struct file_operations proc_numa_maps_operations;
extern const struct file_operations proc_smaps_operations;
extern const struct file_operations proc_clear_refs_operations;
+extern const struct file_operations proc_pagemap_operations;
void free_proc_entry(struct proc_dir_entry *de);
Index: l/fs/proc/task_mmu.c
===================================================================
--- l.orig/fs/proc/task_mmu.c 2007-10-26 11:29:24.000000000 -0500
+++ l/fs/proc/task_mmu.c 2007-10-26 11:34:33.000000000 -0500
@@ -5,7 +5,10 @@
#include <linux/highmem.h>
#include <linux/ptrace.h>
#include <linux/pagemap.h>
+#include <linux/ptrace.h>
#include <linux/mempolicy.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
#include <asm/elf.h>
#include <asm/uaccess.h>
@@ -520,6 +523,201 @@ const struct file_operations proc_clear_
.write = clear_refs_write,
};
+struct pagemapread {
+ char __user *out, *end;
+};
+
+#define PM_ENTRY_BYTES sizeof(u64)
+#define PM_RESERVED_BITS 3
+#define PM_RESERVED_OFFSET (64 - PM_RESERVED_BITS)
+#define PM_RESERVED_MASK (((1LL<<PM_RESERVED_BITS)-1) << PM_RESERVED_OFFSET)
+#define PM_SPECIAL(nr) (((nr) << PM_RESERVED_OFFSET) | PM_RESERVED_MASK)
+#define PM_NOT_PRESENT PM_SPECIAL(1LL)
+#define PM_SWAP PM_SPECIAL(2LL)
+#define PM_END_OF_BUFFER 1
+
+static int add_to_pagemap(unsigned long addr, u64 pfn,
+ struct pagemapread *pm)
+{
+ /*
+ * Make sure there's room in the buffer for an
+ * entire entry. Otherwise, only copy part of
+ * the pfn.
+ */
+ if (pm->out + PM_ENTRY_BYTES >= pm->end) {
+ if (copy_to_user(pm->out, &pfn, pm->end - pm->out))
+ return -EFAULT;
+ pm->out = pm->end;
+ return PM_END_OF_BUFFER;
+ }
+
+ if (put_user(pfn, pm->out))
+ return -EFAULT;
+ pm->out += PM_ENTRY_BYTES;
+ return 0;
+}
+
+static int pagemap_pte_hole(unsigned long start, unsigned long end,
+ void *private)
+{
+ struct pagemapread *pm = private;
+ unsigned long addr;
+ int err = 0;
+ for (addr = start; addr < end; addr += PAGE_SIZE) {
+ err = add_to_pagemap(addr, PM_NOT_PRESENT, pm);
+ if (err)
+ break;
+ }
+ return err;
+}
+
+u64 swap_pte_to_pagemap_entry(pte_t pte)
+{
+ swp_entry_t e = pte_to_swp_entry(pte);
+ return PM_SWAP | swp_type(e) | (swp_offset(e) << MAX_SWAPFILES_SHIFT);
+}
+
+static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+ void *private)
+{
+ struct pagemapread *pm = private;
+ pte_t *pte;
+ int err = 0;
+
+ pte = pte_offset_map(pmd, addr);
+ for (; addr != end; pte++, addr += PAGE_SIZE) {
+ u64 pfn = PM_NOT_PRESENT;
+ if (is_swap_pte(*pte))
+ pfn = swap_pte_to_pagemap_entry(*pte);
+ else if (pte_present(*pte))
+ pfn = pte_pfn(*pte);
+ err = add_to_pagemap(addr, pfn, pm);
+ if (err)
+ return err;
+ }
+ pte_unmap(pte - 1);
+
+ cond_resched();
+
+ return err;
+}
+
+static struct mm_walk pagemap_walk = {
+ .pmd_entry = pagemap_pte_range,
+ .pte_hole = pagemap_pte_hole
+};
+
+/*
+ * /proc/pid/pagemap - an array mapping virtual pages to pfns
+ *
+ * For each page in the address space, this file contains one 64-bit
+ * entry representing the corresponding physical page frame number
+ * (PFN) if the page is present. If there is a swap entry for the
+ * physical page, then an encoding of the swap file number and the
+ * page's offset into the swap file are returned. If no page is
+ * present at all, PM_NOT_PRESENT is returned. This allows determining
+ * precisely which pages are mapped (or in swap) and comparing mapped
+ * pages between processes.
+ *
+ * Efficient users of this interface will use /proc/pid/maps to
+ * determine which areas of memory are actually mapped and llseek to
+ * skip over unmapped regions.
+ */
+static ssize_t pagemap_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
+ struct page **pages, *page;
+ unsigned long uaddr, uend;
+ struct mm_struct *mm;
+ struct pagemapread pm;
+ int pagecount;
+ int ret = -ESRCH;
+
+ if (!task)
+ goto out;
+
+ ret = -EACCES;
+ if (!ptrace_may_attach(task))
+ goto out;
+
+ ret = -EINVAL;
+ /* file position must be aligned */
+ if (*ppos % PM_ENTRY_BYTES)
+ goto out;
+
+ ret = 0;
+ mm = get_task_mm(task);
+ if (!mm)
+ goto out;
+
+ ret = -ENOMEM;
+ uaddr = (unsigned long)buf & PAGE_MASK;
+ uend = (unsigned long)(buf + count);
+ pagecount = (PAGE_ALIGN(uend) - uaddr) / PAGE_SIZE;
+ pages = kmalloc(pagecount * sizeof(struct page *), GFP_KERNEL);
+ if (!pages)
+ goto out_task;
+
+ down_read(¤t->mm->mmap_sem);
+ ret = get_user_pages(current, current->mm, uaddr, pagecount,
+ 1, 0, pages, NULL);
+ up_read(¤t->mm->mmap_sem);
+
+ if (ret < 0)
+ goto out_free;
+
+ pm.out = buf;
+ pm.end = buf + count;
+
+ if (!ptrace_may_attach(task)) {
+ ret = -EIO;
+ } else {
+ unsigned long src = *ppos;
+ unsigned long svpfn = src / PM_ENTRY_BYTES;
+ unsigned long start_vaddr = svpfn << PAGE_SHIFT;
+ unsigned long end_vaddr = TASK_SIZE_OF(task);
+
+ /* watch out for wraparound */
+ if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
+ start_vaddr = end_vaddr;
+
+ /*
+ * The odds are that this will stop walking way
+ * before end_vaddr, because the length of the
+ * user buffer is tracked in "pm", and the walk
+ * will stop when we hit the end of the buffer.
+ */
+ ret = walk_page_range(mm, start_vaddr, end_vaddr,
+ &pagemap_walk, &pm);
+ if (ret == PM_END_OF_BUFFER)
+ ret = 0;
+ /* don't need mmap_sem for these, but this looks cleaner */
+ *ppos += pm.out - buf;
+ if (!ret)
+ ret = pm.out - buf;
+ }
+
+ for (; pagecount; pagecount--) {
+ page = pages[pagecount-1];
+ if (!PageReserved(page))
+ SetPageDirty(page);
+ page_cache_release(page);
+ }
+ mmput(mm);
+out_free:
+ kfree(pages);
+out_task:
+ put_task_struct(task);
+out:
+ return ret;
+}
+
+const struct file_operations proc_pagemap_operations = {
+ .llseek = mem_lseek, /* borrow this */
+ .read = pagemap_read,
+};
+
#ifdef CONFIG_NUMA
extern int show_numa_map(struct seq_file *m, void *v);
@@ -553,4 +751,3 @@ const struct file_operations proc_numa_m
.release = seq_release_private,
};
#endif
-
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 10/12] maps4: add /proc/kpagecount interface
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (8 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 9/12] maps4: add /proc/pid/pagemap interface Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 11/12] maps4: add /proc/kpageflags interface Matt Mackall
2007-10-26 16:36 ` [PATCH 12/12] maps4: make page monitoring /proc file optional Matt Mackall
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Matt Mackall <mpm@selenic.com>
This makes physical page map counts available to userspace. Together
with /proc/pid/pagemap and /proc/pid/clear_refs, this can be used to
monitor memory usage on a per-page basis.
[bunk@stusta.de: make struct proc_kpagemap static]
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Index: l/fs/proc/proc_misc.c
===================================================================
--- l.orig/fs/proc/proc_misc.c 2007-10-22 16:24:34.000000000 -0500
+++ l/fs/proc/proc_misc.c 2007-10-22 17:53:48.000000000 -0500
@@ -46,6 +46,7 @@
#include <linux/vmalloc.h>
#include <linux/crash_dump.h>
#include <linux/pid_namespace.h>
+#include <linux/bootmem.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
#include <asm/io.h>
@@ -656,6 +657,57 @@ static const struct file_operations proc
};
#endif
+#define KPMSIZE sizeof(u64)
+#define KPMMASK (KPMSIZE - 1)
+/* /proc/kpagecount - an array exposing page counts
+ *
+ * Each entry is a u64 representing the corresponding
+ * physical page count.
+ */
+static ssize_t kpagecount_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ u64 __user *out = (u64 __user *)buf;
+ struct page *ppage;
+ unsigned long src = *ppos;
+ unsigned long pfn;
+ ssize_t ret = 0;
+ u64 pcount;
+
+ if (!access_ok(VERIFY_WRITE, buf, count))
+ return -EFAULT;
+
+ pfn = src / KPMSIZE;
+ count = min_t(size_t, count, (max_pfn * KPMSIZE) - src);
+ if (src & KPMMASK || count & KPMMASK)
+ return -EIO;
+
+ while (count > 0) {
+ ppage = pfn_to_page(pfn++);
+ if (!ppage)
+ pcount = 0;
+ else
+ pcount = atomic_read(&ppage->_count);
+
+ if (put_user(pcount, out++)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ count -= KPMSIZE;
+ }
+
+ *ppos += (char __user *)out - buf;
+ if (!ret)
+ ret = (char __user *)out - buf;
+ return ret;
+}
+
+static struct file_operations proc_kpagecount_operations = {
+ .llseek = mem_lseek,
+ .read = kpagecount_read,
+};
+
struct proc_dir_entry *proc_root_kcore;
void create_seq_entry(char *name, mode_t mode, const struct file_operations *f)
@@ -735,6 +787,7 @@ void __init proc_misc_init(void)
(size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
}
#endif
+ create_seq_entry("kpagecount", S_IRUSR, &proc_kpagecount_operations);
#ifdef CONFIG_PROC_VMCORE
proc_vmcore = create_proc_entry("vmcore", S_IRUSR, NULL);
if (proc_vmcore)
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 11/12] maps4: add /proc/kpageflags interface
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (9 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 10/12] maps4: add /proc/kpagecount interface Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
2007-10-26 16:36 ` [PATCH 12/12] maps4: make page monitoring /proc file optional Matt Mackall
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
From: Matt Mackall <mpm@selenic.com>
This makes a subset of physical page flags available to userspace. Together
with /proc/pid/kpagemap, this allows tracking of a wide variety of VM behaviors.
Exported flags are decoupled from the kernel's internal flags. This
allows us to reorder flag bits, and synthesize any bits that get
redefined in terms of other bits.
Signed-off-by: Matt Mackall <mpm@selenic.com>
---
Index: l/fs/proc/proc_misc.c
===================================================================
--- l.orig/fs/proc/proc_misc.c 2007-10-22 17:53:48.000000000 -0500
+++ l/fs/proc/proc_misc.c 2007-10-22 18:55:25.000000000 -0500
@@ -708,6 +708,84 @@ static struct file_operations proc_kpage
.read = kpagecount_read,
};
+/* /proc/kpageflags - an array exposing page flags
+ *
+ * Each entry is a u64 representing the corresponding
+ * physical page flags.
+ */
+
+/* These macros are used to decouple internal flags from exported ones */
+
+#define KPF_LOCKED 0
+#define KPF_ERROR 1
+#define KPF_REFERENCED 2
+#define KPF_UPTODATE 3
+#define KPF_DIRTY 4
+#define KPF_LRU 5
+#define KPF_ACTIVE 6
+#define KPF_SLAB 7
+#define KPF_WRITEBACK 8
+#define KPF_RECLAIM 9
+#define KPF_BUDDY 10
+
+#define kpf_copy_bit(flags, srcpos, dstpos) (((flags >> srcpos) & 1) << dstpos)
+
+static ssize_t kpageflags_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ u64 __user *out = (u64 __user *)buf;
+ struct page *ppage;
+ unsigned long src = *ppos;
+ unsigned long pfn;
+ ssize_t ret = 0;
+ u64 kflags, uflags;
+
+ if (!access_ok(VERIFY_WRITE, buf, count))
+ return -EFAULT;
+
+ pfn = src / KPMSIZE;
+ count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src);
+ if (src & KPMMASK || count & KPMMASK)
+ return -EIO;
+
+ while (count > 0) {
+ ppage = pfn_to_page(pfn++);
+ if (!ppage)
+ kflags = 0;
+ else
+ kflags = ppage->flags;
+
+ uflags = kpf_copy_bit(KPF_LOCKED, PG_locked, kflags) |
+ kpf_copy_bit(kflags, KPF_ERROR, PG_error) |
+ kpf_copy_bit(kflags, KPF_REFERENCED, PG_referenced) |
+ kpf_copy_bit(kflags, KPF_UPTODATE, PG_uptodate) |
+ kpf_copy_bit(kflags, KPF_DIRTY, PG_dirty) |
+ kpf_copy_bit(kflags, KPF_LRU, PG_lru) |
+ kpf_copy_bit(kflags, KPF_ACTIVE, PG_active) |
+ kpf_copy_bit(kflags, KPF_SLAB, PG_slab) |
+ kpf_copy_bit(kflags, KPF_WRITEBACK, PG_writeback) |
+ kpf_copy_bit(kflags, KPF_RECLAIM, PG_reclaim) |
+ kpf_copy_bit(kflags, KPF_BUDDY, PG_buddy);
+
+ if (put_user(uflags, out++)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ count -= KPMSIZE;
+ }
+
+ *ppos += (char __user *)out - buf;
+ if (!ret)
+ ret = (char __user *)out - buf;
+ return ret;
+}
+
+static struct file_operations proc_kpageflags_operations = {
+ .llseek = mem_lseek,
+ .read = kpageflags_read,
+};
+
struct proc_dir_entry *proc_root_kcore;
void create_seq_entry(char *name, mode_t mode, const struct file_operations *f)
@@ -788,6 +866,7 @@ void __init proc_misc_init(void)
}
#endif
create_seq_entry("kpagecount", S_IRUSR, &proc_kpagecount_operations);
+ create_seq_entry("kpageflags", S_IRUSR, &proc_kpageflags_operations);
#ifdef CONFIG_PROC_VMCORE
proc_vmcore = create_proc_entry("vmcore", S_IRUSR, NULL);
if (proc_vmcore)
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 12/12] maps4: make page monitoring /proc file optional
2007-10-26 16:36 [PATCH 0/12] maps4: pagemap monitoring v4 Matt Mackall
` (10 preceding siblings ...)
2007-10-26 16:36 ` [PATCH 11/12] maps4: add /proc/kpageflags interface Matt Mackall
@ 2007-10-26 16:36 ` Matt Mackall
11 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2007-10-26 16:36 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Dave Hansen, Rusty Russell, Jeremy Fitzhardinge, David Rientjes,
Fengguang Wu
Make /proc/ page monitoring configurable
This puts the following files under an embedded config option:
/proc/pid/clear_refs
/proc/pid/smaps
/proc/pid/pagemap
/proc/kpagecount
/proc/kpageflags
Signed-off-by: Matt Mackall <mpm@selenic.com>
Index: l/fs/proc/base.c
===================================================================
--- l.orig/fs/proc/base.c 2007-10-26 11:33:16.000000000 -0500
+++ l/fs/proc/base.c 2007-10-26 11:34:47.000000000 -0500
@@ -2027,7 +2027,7 @@ static const struct pid_entry tgid_base_
LNK("exe", exe),
REG("mounts", S_IRUGO, mounts),
REG("mountstats", S_IRUSR, mountstats),
-#ifdef CONFIG_MMU
+#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps", S_IRUGO, smaps),
REG("pagemap", S_IRUSR, pagemap),
@@ -2314,7 +2314,7 @@ static const struct pid_entry tid_base_s
LNK("root", root),
LNK("exe", exe),
REG("mounts", S_IRUGO, mounts),
-#ifdef CONFIG_MMU
+#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps", S_IRUGO, smaps),
REG("pagemap", S_IRUSR, pagemap),
Index: l/fs/proc/proc_misc.c
===================================================================
--- l.orig/fs/proc/proc_misc.c 2007-10-26 11:34:43.000000000 -0500
+++ l/fs/proc/proc_misc.c 2007-10-26 11:34:47.000000000 -0500
@@ -657,6 +657,7 @@ static const struct file_operations proc
};
#endif
+#ifdef CONFIG_PROC_PAGE_MONITOR
#define KPMSIZE sizeof(u64)
#define KPMMASK (KPMSIZE - 1)
/* /proc/kpagecount - an array exposing page counts
@@ -785,6 +786,7 @@ static struct file_operations proc_kpage
.llseek = mem_lseek,
.read = kpageflags_read,
};
+#endif /* CONFIG_PROC_PAGE_MONITOR */
struct proc_dir_entry *proc_root_kcore;
@@ -865,8 +867,10 @@ void __init proc_misc_init(void)
(size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
}
#endif
+#ifdef CONFIG_PROC_PAGE_MONITOR
create_seq_entry("kpagecount", S_IRUSR, &proc_kpagecount_operations);
create_seq_entry("kpageflags", S_IRUSR, &proc_kpageflags_operations);
+#endif
#ifdef CONFIG_PROC_VMCORE
proc_vmcore = create_proc_entry("vmcore", S_IRUSR, NULL);
if (proc_vmcore)
Index: l/fs/proc/task_mmu.c
===================================================================
--- l.orig/fs/proc/task_mmu.c 2007-10-26 11:34:33.000000000 -0500
+++ l/fs/proc/task_mmu.c 2007-10-26 11:34:47.000000000 -0500
@@ -338,6 +338,7 @@ const struct file_operations proc_maps_o
*/
#define PSS_SHIFT 12
+#ifdef CONFIG_PROC_PAGE_MONITOR
struct mem_size_stats
{
struct vm_area_struct *vma;
@@ -717,6 +718,7 @@ const struct file_operations proc_pagema
.llseek = mem_lseek, /* borrow this */
.read = pagemap_read,
};
+#endif /* CONFIG_PROC_PAGE_MONITOR */
#ifdef CONFIG_NUMA
extern int show_numa_map(struct seq_file *m, void *v);
Index: l/init/Kconfig
===================================================================
--- l.orig/init/Kconfig 2007-10-26 11:19:35.000000000 -0500
+++ l/init/Kconfig 2007-10-26 11:34:47.000000000 -0500
@@ -571,6 +571,16 @@ config SLOB
endchoice
+config PROC_PAGE_MONITOR
+ default y
+ depends PROC_FS && MMU
+ bool "Enable /proc page monitoring" if EMBEDDED
+ help
+ Various /proc files exist to monitor process memory utilization:
+ /proc/pid/smaps, /proc/pid/clear_refs, /proc/pid/pagemap,
+ /proc/kpagecount, and /proc/kpageflags. Disabling these
+ interfaces will reduce the size of the kernel by approximately 4kb.
+
endmenu # General setup
config RT_MUTEXES
Index: l/mm/Makefile
===================================================================
--- l.orig/mm/Makefile 2007-10-26 11:20:01.000000000 -0500
+++ l/mm/Makefile 2007-10-26 11:34:47.000000000 -0500
@@ -5,7 +5,7 @@
mmu-y := nommu.o
mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \
mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
- vmalloc.o pagewalk.o
+ vmalloc.o
obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
page_alloc.o page-writeback.o pdflush.o \
@@ -13,6 +13,7 @@ obj-y := bootmem.o filemap.o mempool.o
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
$(mmu-y)
+obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o
obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
^ permalink raw reply [flat|nested] 15+ messages in thread