linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Debugging: Keep track of page owners
@ 2012-12-05  1:12 Dave Hansen
  2012-12-05 15:01 ` Laura Abbott
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2012-12-05  1:12 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Dave Hansen

From: mel@skynet.ie (Mel Gorman)

PAGE_OWNER tracks free pages by setting page->order to -1.  However, it is
set during __free_pages() which is not the only free path as
__pagevec_free() and free_compound_page() do not go through __free_pages().
 This leads to a situation where free pages are visible in page_owner
which is confusing and might be interpreted as a memory leak.

This patch sets page->owner when PageBuddy is set.  It also prints a
warning to the kernel log if a free page is found that does not appear free
to PAGE_OWNER.  This should be considered a fix to
page-owner-tracking-leak-detector.patch.

This only applies to -mm as PAGE_OWNER is not in mainline.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
Print out PAGE_OWNER statistics in relation to fragmentation avoidance
EDESC
From: Mel Gorman <mel@csn.ul.ie>

When PAGE_OWNER is set, more information is available of relevance to
fragmentation avoidance.  A second line is added to 'page_owner' showing
the PFN, the pageblock number, the mobility type of the page based on its
allocation flags, whether the allocation is improperly placed and the flags. 
A sample entry looks like

Page allocated via order 0, mask 0x1280d2
PFN 7355 Block 7 type 3 Fallback Flags      LA
[0xc01528c6] __handle_mm_fault+598
[0xc0320427] do_page_fault+279
[0xc031ed9a] error_code+114

This information can be used to identify pages that are improperly placed.  As
the format of PAGE_OWNER data is now different, the comment at the top of
Documentation/page_owner.c is updated with new instructions.

As PAGE_OWNER tracks the GFP flags used to allocate the pages,
/proc/pagetypeinfo is enhanced to contain how many mixed blocks exist.  The
additional output looks like

Number of mixed blocks    Unmovable  Reclaimable      Movable      Reserve
Node 0, zone      DMA            0            1            2            1
Node 0, zone   Normal            2           11           33            0

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
DESC
Allow PAGE_OWNER to be set on any architecture
EDESC
From: Mel Gorman <mel@csn.ul.ie>

Currently PAGE_OWNER depends on CONFIG_X86.  This appears to be due to
pfn_to_page() being called in an inappropriate for many memory models and
the presense of memory holes.  This patch ensures that pfn_valid() and
pfn_valid_within() is called at the appropriate places and the offsets
correctly updated so that PAGE_OWNER is safe on any architecture.

In situations where CONFIG_HOLES_IN_ZONES is set (IA64 with
VIRTUAL_MEM_MAP), there may be cases where pages allocated within a
MAX_ORDER_NR_PAGES block of pages may not be displayed in 'page_owner'
if the hole is at the start of the block.  Addressing this would be quite
complex, perform slowly and is of no clear benefit.

Once PAGE_OWNER is allowed on all architectures, the statistics for
grouping pages by mobility that declare how many pageblocks contain mixed
page types becomes optionally available on all arches.

This patch was tested successfully on x86, x86_64, ppc64 and IA64 machines.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix
EDESC
From: Andrew Morton <akpm@linux-foundation.org>

Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix fix
EDESC
From: mel@skynet.ie (Mel Gorman)

Page-owner-tracking stores the a backtrace of an allocation in the struct
page.  How the stack trace is generated depends on whether
CONFIG_FRAME_POINTER is set or not.  If CONFIG_FRAME_POINTER is set, the
frame pointer must be read using some inline assembler which is not
available for all architectures.

This patch uses the frame pointer where it is available but has a fallback
where it is not.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/Documentation/page_owner.c |  141 +++++++++++++++++++++
 linux-2.6.git-dave/include/linux/mm_types.h   |    5 
 linux-2.6.git-dave/lib/Kconfig.debug          |   11 +
 linux-2.6.git-dave/mm/Makefile                |    1 
 linux-2.6.git-dave/mm/page_alloc.c            |   76 +++++++++++
 linux-2.6.git-dave/mm/pageowner.c             |  174 ++++++++++++++++++++++++++
 linux-2.6.git-dave/mm/vmstat.c                |   93 +++++++++++++
 7 files changed, 501 insertions(+)

diff -puN /dev/null Documentation/page_owner.c
--- /dev/null	2012-06-13 15:09:09.708529931 -0400
+++ linux-2.6.git-dave/Documentation/page_owner.c	2012-12-04 20:07:28.580380696 -0500
@@ -0,0 +1,141 @@
+/*
+ * User-space helper to sort the output of /sys/kernel/debug/page_owner
+ *
+ * Example use:
+ * cat /sys/kernel/debug/page_owner > page_owner_full.txt
+ * grep -v ^PFN page_owner_full.txt > page_owner.txt
+ * ./sort page_owner.txt sorted_page_owner.txt
+*/
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+
+struct block_list {
+	char *txt;
+	int len;
+	int num;
+};
+
+
+static struct block_list *list;
+static int list_size;
+static int max_size;
+
+struct block_list *block_head;
+
+int read_block(char *buf, FILE *fin)
+{
+	int ret = 0;
+	int hit = 0;
+	char *curr = buf;
+
+	for (;;) {
+		*curr = getc(fin);
+		if (*curr == EOF) return -1;
+
+		ret++;
+		if (*curr == '\n' && hit == 1)
+			return ret - 1;
+		else if (*curr == '\n')
+			hit = 1;
+		else
+			hit = 0;
+		curr++;
+	}
+}
+
+static int compare_txt(struct block_list *l1, struct block_list *l2)
+{
+	return strcmp(l1->txt, l2->txt);
+}
+
+static int compare_num(struct block_list *l1, struct block_list *l2)
+{
+	return l2->num - l1->num;
+}
+
+static void add_list(char *buf, int len)
+{
+	if (list_size != 0 &&
+	    len == list[list_size-1].len &&
+	    memcmp(buf, list[list_size-1].txt, len) == 0) {
+		list[list_size-1].num++;
+		return;
+	}
+	if (list_size == max_size) {
+		printf("max_size too small??\n");
+		exit(1);
+	}
+	list[list_size].txt = malloc(len+1);
+	list[list_size].len = len;
+	list[list_size].num = 1;
+	memcpy(list[list_size].txt, buf, len);
+	list[list_size].txt[len] = 0;
+	list_size++;
+	if (list_size % 1000 == 0) {
+		printf("loaded %d\r", list_size);
+		fflush(stdout);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	FILE *fin, *fout;
+	char buf[1024];
+	int ret, i, count;
+	struct block_list *list2;
+	struct stat st;
+
+	fin = fopen(argv[1], "r");
+	fout = fopen(argv[2], "w");
+	if (!fin || !fout) {
+		printf("Usage: ./program <input> <output>\n");
+		perror("open: ");
+		exit(2);
+	}
+
+	fstat(fileno(fin), &st);
+	max_size = st.st_size / 100; /* hack ... */
+
+	list = malloc(max_size * sizeof(*list));
+
+	for(;;) {
+		ret = read_block(buf, fin);
+		if (ret < 0)
+			break;
+
+		buf[ret] = '\0';
+		add_list(buf, ret);
+	}
+
+	printf("loaded %d\n", list_size);
+
+	printf("sorting ....\n");
+
+	qsort(list, list_size, sizeof(list[0]), compare_txt);
+
+	list2 = malloc(sizeof(*list) * list_size);
+
+	printf("culling\n");
+
+	for (i=count=0;i<list_size;i++) {
+		if (count == 0 ||
+		    strcmp(list2[count-1].txt, list[i].txt) != 0) {
+			list2[count++] = list[i];
+		} else {
+			list2[count-1].num += list[i].num;
+		}
+	}
+
+	qsort(list2, count, sizeof(list[0]), compare_num);
+
+	for (i=0;i<count;i++) {
+		fprintf(fout, "%d times:\n%s\n", list2[i].num, list2[i].txt);
+	}
+	return 0;
+}
diff -puN include/linux/mm_types.h~pageowner include/linux/mm_types.h
--- linux-2.6.git/include/linux/mm_types.h~pageowner	2012-12-04 20:06:36.795943398 -0500
+++ linux-2.6.git-dave/include/linux/mm_types.h	2012-12-04 20:06:36.807943499 -0500
@@ -175,6 +175,11 @@ struct page {
 	 */
 	void *shadow;
 #endif
+#ifdef CONFIG_PAGE_OWNER
+	int order;
+	unsigned int gfp_mask;
+	unsigned long trace[8];
+#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff -puN lib/Kconfig.debug~pageowner lib/Kconfig.debug
--- linux-2.6.git/lib/Kconfig.debug~pageowner	2012-12-04 20:06:36.795943398 -0500
+++ linux-2.6.git-dave/lib/Kconfig.debug	2012-12-04 20:06:36.811943533 -0500
@@ -99,6 +99,17 @@ config UNUSED_SYMBOLS
 	  you really need it, and what the merge plan to the mainline kernel for
 	  your module is.
 
+config PAGE_OWNER
+	bool "Track page owner"
+	depends on DEBUG_KERNEL
+	select DEBUG_FS
+	help
+	  This keeps track of what call chain is the owner of a page, may
+	  help to find bare alloc_page(s) leaks. Eats a fair amount of memory.
+	  See Documentation/page_owner.c for user-space helper.
+
+	  If unsure, say N.
+
 config DEBUG_FS
 	bool "Debug Filesystem"
 	help
diff -puN mm/Makefile~pageowner mm/Makefile
--- linux-2.6.git/mm/Makefile~pageowner	2012-12-04 20:06:36.799943431 -0500
+++ linux-2.6.git-dave/mm/Makefile	2012-12-04 20:06:36.811943533 -0500
@@ -57,3 +57,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak
 obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
 obj-$(CONFIG_CLEANCACHE) += cleancache.o
 obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
+obj-$(CONFIG_PAGE_OWNER) += pageowner.o
diff -puN mm/page_alloc.c~pageowner mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~pageowner	2012-12-04 20:06:36.803943465 -0500
+++ linux-2.6.git-dave/mm/page_alloc.c	2012-12-04 20:06:36.811943533 -0500
@@ -437,6 +437,9 @@ static inline void set_page_order(struct
 {
 	set_page_private(page, order);
 	__SetPageBuddy(page);
+#ifdef CONFIG_PAGE_OWNER
+	page->order = -1;
+#endif
 }
 
 static inline void rmv_page_order(struct page *page)
@@ -2250,6 +2253,63 @@ __perform_reclaim(gfp_t gfp_mask, unsign
 	return progress;
 }
 
+#ifdef CONFIG_PAGE_OWNER
+static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
+{
+	return	p > (void *)tinfo &&
+		p < (void *)tinfo + THREAD_SIZE - 3;
+}
+
+static inline void __stack_trace(struct page *page, unsigned long *stack,
+			unsigned long bp)
+{
+	int i = 0;
+	unsigned long addr;
+	struct thread_info *tinfo = (struct thread_info *)
+		((unsigned long)stack & (~(THREAD_SIZE - 1)));
+
+	memset(page->trace, 0, sizeof(long) * 8);
+
+#ifdef CONFIG_FRAME_POINTER
+	if (bp) {
+		while (valid_stack_ptr(tinfo, (void *)bp)) {
+			addr = *(unsigned long *)(bp + sizeof(long));
+			page->trace[i] = addr;
+			if (++i >= 8)
+				break;
+			bp = *(unsigned long *)bp;
+		}
+		return;
+	}
+#endif /* CONFIG_FRAME_POINTER */
+	while (valid_stack_ptr(tinfo, stack)) {
+		addr = *stack++;
+		if (__kernel_text_address(addr)) {
+			page->trace[i] = addr;
+			if (++i >= 8)
+				break;
+		}
+	}
+}
+
+static void set_page_owner(struct page *page, unsigned int order,
+			unsigned int gfp_mask)
+{
+	unsigned long address;
+	unsigned long bp = 0;
+#ifdef CONFIG_X86_64
+	asm ("movq %%rbp, %0" : "=r" (bp) : );
+#endif
+#ifdef CONFIG_X86_32
+	asm ("movl %%ebp, %0" : "=r" (bp) : );
+#endif
+	page->order = (int) order;
+	page->gfp_mask = gfp_mask;
+	__stack_trace(page, &address, bp);
+}
+#endif /* CONFIG_PAGE_OWNER */
+
+
 /* The really slow allocator path where we enter direct reclaim */
 static inline struct page *
 __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
@@ -2285,6 +2345,10 @@ retry:
 		goto retry;
 	}
 
+#ifdef CONFIG_PAGE_OWNER
+	if (page)
+		set_page_owner(page, order, gfp_mask);
+#endif
 	return page;
 }
 
@@ -2593,6 +2657,10 @@ nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 	return page;
 got_pg:
+#ifdef CONFIG_PAGE_OWNER
+	if (page)
+		set_page_owner(page, order, gfp_mask);
+#endif
 	if (kmemcheck_enabled)
 		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
 
@@ -2665,6 +2733,11 @@ out:
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
 		goto retry_cpuset;
 
+#ifdef CONFIG_PAGE_OWNER
+	if (page)
+		set_page_owner(page, order, gfp_mask);
+#endif
+
 	return page;
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
@@ -3869,6 +3942,9 @@ void __meminit memmap_init_zone(unsigned
 		if (!is_highmem_idx(zone))
 			set_page_address(page, __va(pfn << PAGE_SHIFT));
 #endif
+#ifdef CONFIG_PAGE_OWNER
+		page->order = -1;
+#endif
 	}
 }
 
diff -puN /dev/null mm/pageowner.c
--- /dev/null	2012-06-13 15:09:09.708529931 -0400
+++ linux-2.6.git-dave/mm/pageowner.c	2012-12-04 20:06:36.811943533 -0500
@@ -0,0 +1,174 @@
+#include <linux/debugfs.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/huge_mm.h>
+#include <linux/mount.h>
+#include <linux/seq_file.h>
+#include <linux/highmem.h>
+#include <linux/ptrace.h>
+#include <linux/slab.h>
+#include <linux/pagemap.h>
+#include <linux/mempolicy.h>
+#include <linux/rmap.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+
+#include <asm/elf.h>
+#include <asm/uaccess.h>
+#include <asm/tlbflush.h>
+#include "internal.h"
+
+#include <linux/bootmem.h>
+#include <linux/kallsyms.h>
+
+static ssize_t
+read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+{
+	unsigned long pfn;
+	struct page *page;
+	char *kbuf, *modname;
+	const char *symname;
+	int ret = 0;
+	char namebuf[128];
+	unsigned long offset = 0, symsize;
+	int i;
+	ssize_t num_written = 0;
+	int blocktype = 0, pagetype = 0;
+
+	page = NULL;
+	pfn = min_low_pfn + *ppos;
+
+	/* Find a valid PFN or the start of a MAX_ORDER_NR_PAGES area */
+	while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
+		pfn++;
+
+	//printk("pfn: %ld max_pfn: %ld\n", pfn, max_pfn);
+	/* Find an allocated page */
+	for (; pfn < max_pfn; pfn++) {
+		/*
+		 * If the new page is in a new MAX_ORDER_NR_PAGES area,
+		 * validate the area as existing, skip it if not
+		 */
+		if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn)) {
+			pfn += MAX_ORDER_NR_PAGES - 1;
+			continue;
+		}
+
+		/* Check for holes within a MAX_ORDER area */
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+
+		/* Catch situations where free pages have a bad ->order  */
+		if (page->order >= 0 && PageBuddy(page))
+			printk(KERN_WARNING
+				"PageOwner info inaccurate for PFN %lu\n",
+				pfn);
+
+		/* Stop search if page is allocated and has trace info */
+		if (page->order >= 0 && page->trace[0]) {
+			//intk("stopped search at pfn: %ld\n", pfn);
+			break;
+		}
+	}
+
+	if (!pfn_valid(pfn))
+		return 0;
+	/*
+	 * If memory does not end at a SECTION_SIZE boundary, then
+	 * we might have a pfn_valid() above max_pfn
+	 */
+	if (pfn >= max_pfn)
+		return 0;
+
+	/* Record the next PFN to read in the file offset */
+	*ppos = (pfn - min_low_pfn) + 1;
+
+	kbuf = kmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	//printk("page: %p\n", page);
+	ret = snprintf(kbuf, count, "Page allocated via order %d, mask 0x%x\n",
+			page->order, page->gfp_mask);
+	if (ret >= count) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Print information relevant to grouping pages by mobility */
+	blocktype = get_pageblock_migratetype(page);
+	pagetype  = allocflags_to_migratetype(page->gfp_mask);
+	ret += snprintf(kbuf+ret, count-ret,
+			"PFN %lu Block %lu type %d %s "
+			"Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
+			pfn,
+			pfn >> pageblock_order,
+			blocktype,
+			blocktype != pagetype ? "Fallback" : "        ",
+			PageLocked(page)	? "K" : " ",
+			PageError(page)		? "E" : " ",
+			PageReferenced(page)	? "R" : " ",
+			PageUptodate(page)	? "U" : " ",
+			PageDirty(page)		? "D" : " ",
+			PageLRU(page)		? "L" : " ",
+			PageActive(page)	? "A" : " ",
+			PageSlab(page)		? "S" : " ",
+			PageWriteback(page)	? "W" : " ",
+			PageCompound(page)	? "C" : " ",
+			PageSwapCache(page)	? "B" : " ",
+			PageMappedToDisk(page)	? "M" : " ");
+	if (ret >= count) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	num_written = ret;
+
+	for (i = 0; i < 8; i++) {
+		if (!page->trace[i])
+			break;
+		symname = kallsyms_lookup(page->trace[i], &symsize, &offset,
+					&modname, namebuf);
+		ret = snprintf(kbuf + num_written, count - num_written,
+				"[0x%lx] %s+%lu\n",
+				page->trace[i], namebuf, offset);
+		if (ret >= count - num_written) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		num_written += ret;
+	}
+
+	ret = snprintf(kbuf + num_written, count - num_written, "\n");
+	if (ret >= count - num_written) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	num_written += ret;
+	ret = num_written;
+
+	if (copy_to_user(buf, kbuf, ret))
+		ret = -EFAULT;
+out:
+	kfree(kbuf);
+	return ret;
+}
+
+static struct file_operations proc_page_owner_operations = {
+	.read		= read_page_owner,
+};
+
+static int __init pageowner_init(void)
+{
+	struct dentry *dentry;
+
+	dentry = debugfs_create_file("page_owner", S_IRUSR, NULL,
+			NULL, &proc_page_owner_operations);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+	return 0;
+}
+module_init(pageowner_init)
diff -puN mm/vmstat.c~pageowner mm/vmstat.c
--- linux-2.6.git/mm/vmstat.c~pageowner	2012-12-04 20:06:36.803943465 -0500
+++ linux-2.6.git-dave/mm/vmstat.c	2012-12-04 20:06:36.815943566 -0500
@@ -19,6 +19,7 @@
 #include <linux/math64.h>
 #include <linux/writeback.h>
 #include <linux/compaction.h>
+#include "internal.h"
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -921,6 +922,97 @@ static int pagetypeinfo_showblockcount(s
 	return 0;
 }
 
+#ifdef CONFIG_PAGE_OWNER
+static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
+							pg_data_t *pgdat,
+							struct zone *zone)
+{
+	int mtype, pagetype;
+	unsigned long pfn;
+	unsigned long start_pfn = zone->zone_start_pfn;
+	unsigned long end_pfn = start_pfn + zone->spanned_pages;
+	unsigned long count[MIGRATE_TYPES] = { 0, };
+
+	/* Align PFNs to pageblock_nr_pages boundary */
+	pfn = start_pfn & ~(pageblock_nr_pages-1);
+
+	/*
+	 * Walk the zone in pageblock_nr_pages steps. If a page block spans
+	 * a zone boundary, it will be double counted between zones. This does
+	 * not matter as the mixed block count will still be correct
+	 */
+	for (; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		struct page *page;
+		unsigned long offset = 0;
+
+		/* Do not read before the zone start, use a valid page */
+		if (pfn < start_pfn)
+			offset = start_pfn - pfn;
+
+		if (!pfn_valid(pfn + offset))
+			continue;
+
+		page = pfn_to_page(pfn + offset);
+		mtype = get_pageblock_migratetype(page);
+
+		/* Check the block for bad migrate types */
+		for (; offset < pageblock_nr_pages; offset++) {
+			/* Do not past the end of the zone */
+			if (pfn + offset >= end_pfn)
+				break;
+
+			if (!pfn_valid_within(pfn + offset))
+				continue;
+
+			page = pfn_to_page(pfn + offset);
+
+			/* Skip free pages */
+			if (PageBuddy(page)) {
+				offset += (1UL << page_order(page)) - 1UL;
+				continue;
+			}
+			if (page->order < 0)
+				continue;
+
+			pagetype = allocflags_to_migratetype(page->gfp_mask);
+			if (pagetype != mtype) {
+				count[mtype]++;
+				break;
+			}
+
+			/* Move to end of this allocation */
+			offset += (1 << page->order) - 1;
+		}
+	}
+
+	/* Print counts */
+	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12lu ", count[mtype]);
+	seq_putc(m, '\n');
+}
+#endif /* CONFIG_PAGE_OWNER */
+
+/*
+ * Print out the number of pageblocks for each migratetype that contain pages
+ * of other types. This gives an indication of how well fallbacks are being
+ * contained by rmqueue_fallback(). It requires information from PAGE_OWNER
+ * to determine what is going on
+ */
+static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
+{
+#ifdef CONFIG_PAGE_OWNER
+	int mtype;
+
+	seq_printf(m, "\n%-23s", "Number of mixed blocks ");
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12s ", migratetype_names[mtype]);
+	seq_putc(m, '\n');
+
+	walk_zones_in_node(m, pgdat, pagetypeinfo_showmixedcount_print);
+#endif /* CONFIG_PAGE_OWNER */
+}
+
 /*
  * This prints out statistics in relation to grouping pages by mobility.
  * It is expensive to collect so do not constantly read the file.
@@ -938,6 +1030,7 @@ static int pagetypeinfo_show(struct seq_
 	seq_putc(m, '\n');
 	pagetypeinfo_showfree(m, pgdat);
 	pagetypeinfo_showblockcount(m, pgdat);
+	pagetypeinfo_showmixedcount(m, pgdat);
 
 	return 0;
 }
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-05  1:12 Dave Hansen
@ 2012-12-05 15:01 ` Laura Abbott
  2012-12-05 17:48   ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Laura Abbott @ 2012-12-05 15:01 UTC (permalink / raw)
  To: Dave Hansen; +Cc: akpm, linux-mm

Hi,

This looks really useful. I'd like to see it usable on ARM. A couple of 
quick comments:

On 12/4/2012 5:12 PM, Dave Hansen wrote:
> From: mel@skynet.ie (Mel Gorman)

<snip>
>
> +#ifdef CONFIG_PAGE_OWNER
> +static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
> +{
> +	return	p > (void *)tinfo &&
> +		p < (void *)tinfo + THREAD_SIZE - 3;
> +}
> +
> +static inline void __stack_trace(struct page *page, unsigned long *stack,
> +			unsigned long bp)
> +{
> +	int i = 0;
> +	unsigned long addr;
> +	struct thread_info *tinfo = (struct thread_info *)
> +		((unsigned long)stack & (~(THREAD_SIZE - 1)));
> +
> +	memset(page->trace, 0, sizeof(long) * 8);
> +
> +#ifdef CONFIG_FRAME_POINTER
> +	if (bp) {
> +		while (valid_stack_ptr(tinfo, (void *)bp)) {
> +			addr = *(unsigned long *)(bp + sizeof(long));
> +			page->trace[i] = addr;
> +			if (++i >= 8)
> +				break;
> +			bp = *(unsigned long *)bp;
> +		}
> +		return;
> +	}
> +#endif /* CONFIG_FRAME_POINTER */
> +	while (valid_stack_ptr(tinfo, stack)) {
> +		addr = *stack++;
> +		if (__kernel_text_address(addr)) {
> +			page->trace[i] = addr;
> +			if (++i >= 8)
> +				break;
> +		}
> +	}
> +}
> +
> +static void set_page_owner(struct page *page, unsigned int order,
> +			unsigned int gfp_mask)
> +{
> +	unsigned long address;
> +	unsigned long bp = 0;
> +#ifdef CONFIG_X86_64
> +	asm ("movq %%rbp, %0" : "=r" (bp) : );
> +#endif
> +#ifdef CONFIG_X86_32
> +	asm ("movl %%ebp, %0" : "=r" (bp) : );
> +#endif
> +	page->order = (int) order;
> +	page->gfp_mask = gfp_mask;
> +	__stack_trace(page, &address, bp);
> +}
> +#endif /* CONFIG_PAGE_OWNER */
> +
> +
>   /* The really slow allocator path where we enter direct reclaim */
>   static inline struct page *
>   __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> @@ -2285,6 +2345,10 @@ retry:
>   		goto retry;
>   	}
>
> +#ifdef CONFIG_PAGE_OWNER
> +	if (page)
> +		set_page_owner(page, order, gfp_mask);
> +#endif
>   	return page;
>   }
>
> @@ -2593,6 +2657,10 @@ nopage:
>   	warn_alloc_failed(gfp_mask, order, NULL);
>   	return page;
>   got_pg:
> +#ifdef CONFIG_PAGE_OWNER
> +	if (page)
> +		set_page_owner(page, order, gfp_mask);
> +#endif
>   	if (kmemcheck_enabled)
>   		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
>
> @@ -2665,6 +2733,11 @@ out:
>   	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
>   		goto retry_cpuset;
>
> +#ifdef CONFIG_PAGE_OWNER
> +	if (page)
> +		set_page_owner(page, order, gfp_mask);
> +#endif
> +
>   	return page;
>   }
>   EXPORT_SYMBOL(__alloc_pages_nodemask);
> @@ -3869,6 +3942,9 @@ void __meminit memmap_init_zone(unsigned
>   		if (!is_highmem_idx(zone))
>   			set_page_address(page, __va(pfn << PAGE_SHIFT));
>   #endif
> +#ifdef CONFIG_PAGE_OWNER
> +		page->order = -1;
> +#endif
>   	}
>   }
>

Any reason you are using custom stack saving code instead of using the 
save_stack_trace API? (include/linux/stacktrace.h) . This is implemented 
on all architectures and takes care of special considerations for 
architectures such as ARM.

<snip>
> diff -puN mm/vmstat.c~pageowner mm/vmstat.c
> --- linux-2.6.git/mm/vmstat.c~pageowner	2012-12-04 20:06:36.803943465 -0500
> +++ linux-2.6.git-dave/mm/vmstat.c	2012-12-04 20:06:36.815943566 -0500
> @@ -19,6 +19,7 @@
>   #include <linux/math64.h>
>   #include <linux/writeback.h>
>   #include <linux/compaction.h>
> +#include "internal.h"
>
>   #ifdef CONFIG_VM_EVENT_COUNTERS
>   DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
> @@ -921,6 +922,97 @@ static int pagetypeinfo_showblockcount(s
>   	return 0;
>   }
>
> +#ifdef CONFIG_PAGE_OWNER
> +static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
> +							pg_data_t *pgdat,
> +							struct zone *zone)
> +{
> +	int mtype, pagetype;
> +	unsigned long pfn;
> +	unsigned long start_pfn = zone->zone_start_pfn;
> +	unsigned long end_pfn = start_pfn + zone->spanned_pages;
> +	unsigned long count[MIGRATE_TYPES] = { 0, };
> +
> +	/* Align PFNs to pageblock_nr_pages boundary */
> +	pfn = start_pfn & ~(pageblock_nr_pages-1);
> +
> +	/*
> +	 * Walk the zone in pageblock_nr_pages steps. If a page block spans
> +	 * a zone boundary, it will be double counted between zones. This does
> +	 * not matter as the mixed block count will still be correct
> +	 */
> +	for (; pfn < end_pfn; pfn += pageblock_nr_pages) {
> +		struct page *page;
> +		unsigned long offset = 0;
> +
> +		/* Do not read before the zone start, use a valid page */
> +		if (pfn < start_pfn)
> +			offset = start_pfn - pfn;
> +
> +		if (!pfn_valid(pfn + offset))
> +			continue;
> +
> +		page = pfn_to_page(pfn + offset);
> +		mtype = get_pageblock_migratetype(page);
> +
> +		/* Check the block for bad migrate types */
> +		for (; offset < pageblock_nr_pages; offset++) {
> +			/* Do not past the end of the zone */
> +			if (pfn + offset >= end_pfn)
> +				break;
> +
> +			if (!pfn_valid_within(pfn + offset))
> +				continue;
> +
> +			page = pfn_to_page(pfn + offset);
> +
> +			/* Skip free pages */
> +			if (PageBuddy(page)) {
> +				offset += (1UL << page_order(page)) - 1UL;
> +				continue;
> +			}
> +			if (page->order < 0)
> +				continue;
> +
> +			pagetype = allocflags_to_migratetype(page->gfp_mask);
> +			if (pagetype != mtype) {
> +				count[mtype]++;
> +				break;
> +			}
> +
MIGRATE_CMA pages (with CONFIG_CMA) will always have pagetype != mtype 
so CMA pages will always show up here even though they are considered 
movable pages. That's probably not what you want here.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-05 15:01 ` Laura Abbott
@ 2012-12-05 17:48   ` Dave Hansen
  2012-12-05 22:36     ` Andrew Morton
  2012-12-07 16:37     ` Laura Abbott
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Hansen @ 2012-12-05 17:48 UTC (permalink / raw)
  To: Laura Abbott; +Cc: akpm, linux-mm

On 12/05/2012 07:01 AM, Laura Abbott wrote:\
> Any reason you are using custom stack saving code instead of using the
> save_stack_trace API? (include/linux/stacktrace.h) . This is implemented
> on all architectures and takes care of special considerations for
> architectures such as ARM.

This is actually an ancient patch that Andrew's been carrying around and
updating periodically.  I didn't duck fast enough and got stuck updating
it. :)

> MIGRATE_CMA pages (with CONFIG_CMA) will always have pagetype != mtype
> so CMA pages will always show up here even though they are considered
> movable pages. That's probably not what you want here.

What do you think the the right way to handle it is?  Should we just
check explicitly for CMA pages and punt on them?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-05 17:48   ` Dave Hansen
@ 2012-12-05 22:36     ` Andrew Morton
  2012-12-07 16:37     ` Laura Abbott
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2012-12-05 22:36 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Laura Abbott, linux-mm

On Wed, 05 Dec 2012 09:48:00 -0800
Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On 12/05/2012 07:01 AM, Laura Abbott wrote:\
> > Any reason you are using custom stack saving code instead of using the
> > save_stack_trace API? (include/linux/stacktrace.h) . This is implemented
> > on all architectures and takes care of special considerations for
> > architectures such as ARM.
> 
> This is actually an ancient patch that Andrew's been carrying around and
> updating periodically.  I didn't duck fast enough and got stuck updating
> it. :)

Yes, it's a sweet little patch and has saved our ass a few times.  It
would be nice if someone were to, umm, productize it and get it merged.

However, do see https://lkml.org/lkml/2009/4/1/137 where Ingo discusses
conversion to using the tracing infrastructure.

btw, the original patch was from the lost-lost and dearly missed
Alexander Nyberg <alexn@dsv.su.se>.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-05 17:48   ` Dave Hansen
  2012-12-05 22:36     ` Andrew Morton
@ 2012-12-07 16:37     ` Laura Abbott
  1 sibling, 0 replies; 12+ messages in thread
From: Laura Abbott @ 2012-12-07 16:37 UTC (permalink / raw)
  To: Dave Hansen; +Cc: akpm, linux-mm

On 12/5/2012 9:48 AM, Dave Hansen wrote:

>> MIGRATE_CMA pages (with CONFIG_CMA) will always have pagetype != mtype
>> so CMA pages will always show up here even though they are considered
>> movable pages. That's probably not what you want here.
>
> What do you think the the right way to handle it is?  Should we just
> check explicitly for CMA pages and punt on them?
>

That sounds like a reasonable approach to start.

Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Debugging: Keep track of page owners
@ 2012-12-07 21:24 Dave Hansen
  2012-12-07 22:26 ` Andrew Morton
  2012-12-08  2:58 ` Steven Rostedt
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Hansen @ 2012-12-07 21:24 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Dave Hansen

From: mel@skynet.ie (Mel Gorman)

PAGE_OWNER tracks free pages by setting page->order to -1.  However, it is
set during __free_pages() which is not the only free path as
__pagevec_free() and free_compound_page() do not go through __free_pages().
 This leads to a situation where free pages are visible in page_owner
which is confusing and might be interpreted as a memory leak.

This patch sets page->owner when PageBuddy is set.  It also prints a
warning to the kernel log if a free page is found that does not appear free
to PAGE_OWNER.  This should be considered a fix to
page-owner-tracking-leak-detector.patch.

This only applies to -mm as PAGE_OWNER is not in mainline.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
Print out PAGE_OWNER statistics in relation to fragmentation avoidance
EDESC
From: Mel Gorman <mel@csn.ul.ie>

When PAGE_OWNER is set, more information is available of relevance to
fragmentation avoidance.  A second line is added to 'page_owner' showing
the PFN, the pageblock number, the mobility type of the page based on its
allocation flags, whether the allocation is improperly placed and the flags. 
A sample entry looks like

Page allocated via order 0, mask 0x1280d2
PFN 7355 Block 7 type 3 Fallback Flags      LA
[0xc01528c6] __handle_mm_fault+598
[0xc0320427] do_page_fault+279
[0xc031ed9a] error_code+114

This information can be used to identify pages that are improperly placed.  As
the format of PAGE_OWNER data is now different, the comment at the top of
Documentation/page_owner.c is updated with new instructions.

As PAGE_OWNER tracks the GFP flags used to allocate the pages,
/proc/pagetypeinfo is enhanced to contain how many mixed blocks exist.  The
additional output looks like

Number of mixed blocks    Unmovable  Reclaimable      Movable      Reserve
Node 0, zone      DMA            0            1            2            1
Node 0, zone   Normal            2           11           33            0

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
DESC
Allow PAGE_OWNER to be set on any architecture
EDESC
From: Mel Gorman <mel@csn.ul.ie>

Currently PAGE_OWNER depends on CONFIG_X86.  This appears to be due to
pfn_to_page() being called in an inappropriate for many memory models and
the presense of memory holes.  This patch ensures that pfn_valid() and
pfn_valid_within() is called at the appropriate places and the offsets
correctly updated so that PAGE_OWNER is safe on any architecture.

In situations where CONFIG_HOLES_IN_ZONES is set (IA64 with
VIRTUAL_MEM_MAP), there may be cases where pages allocated within a
MAX_ORDER_NR_PAGES block of pages may not be displayed in 'page_owner'
if the hole is at the start of the block.  Addressing this would be quite
complex, perform slowly and is of no clear benefit.

Once PAGE_OWNER is allowed on all architectures, the statistics for
grouping pages by mobility that declare how many pageblocks contain mixed
page types becomes optionally available on all arches.

This patch was tested successfully on x86, x86_64, ppc64 and IA64 machines.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix
EDESC
From: Andrew Morton <akpm@linux-foundation.org>

Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix fix
EDESC
From: mel@skynet.ie (Mel Gorman)

Page-owner-tracking stores the a backtrace of an allocation in the struct
page.  How the stack trace is generated depends on whether
CONFIG_FRAME_POINTER is set or not.  If CONFIG_FRAME_POINTER is set, the
frame pointer must be read using some inline assembler which is not
available for all architectures.

This patch uses the frame pointer where it is available but has a fallback
where it is not.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/Documentation/page_owner.c |  141 ++++++++++++++++++++++
 linux-2.6.git-dave/include/linux/mm_types.h   |    7 +
 linux-2.6.git-dave/include/linux/stacktrace.h |    3 
 linux-2.6.git-dave/kernel/stacktrace.c        |   23 +++
 linux-2.6.git-dave/lib/Kconfig.debug          |   12 +
 linux-2.6.git-dave/mm/Makefile                |    1 
 linux-2.6.git-dave/mm/page_alloc.c            |   29 ++++
 linux-2.6.git-dave/mm/pageowner.c             |  163 ++++++++++++++++++++++++++
 linux-2.6.git-dave/mm/vmstat.c                |   96 +++++++++++++++
 9 files changed, 475 insertions(+)

diff -puN /dev/null Documentation/page_owner.c
--- /dev/null	2012-06-13 15:09:09.708529931 -0400
+++ linux-2.6.git-dave/Documentation/page_owner.c	2012-12-07 16:22:43.872270758 -0500
@@ -0,0 +1,141 @@
+/*
+ * User-space helper to sort the output of /sys/kernel/debug/page_owner
+ *
+ * Example use:
+ * cat /sys/kernel/debug/page_owner > page_owner_full.txt
+ * grep -v ^PFN page_owner_full.txt > page_owner.txt
+ * ./sort page_owner.txt sorted_page_owner.txt
+*/
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+
+struct block_list {
+	char *txt;
+	int len;
+	int num;
+};
+
+
+static struct block_list *list;
+static int list_size;
+static int max_size;
+
+struct block_list *block_head;
+
+int read_block(char *buf, FILE *fin)
+{
+	int ret = 0;
+	int hit = 0;
+	char *curr = buf;
+
+	for (;;) {
+		*curr = getc(fin);
+		if (*curr == EOF) return -1;
+
+		ret++;
+		if (*curr == '\n' && hit == 1)
+			return ret - 1;
+		else if (*curr == '\n')
+			hit = 1;
+		else
+			hit = 0;
+		curr++;
+	}
+}
+
+static int compare_txt(struct block_list *l1, struct block_list *l2)
+{
+	return strcmp(l1->txt, l2->txt);
+}
+
+static int compare_num(struct block_list *l1, struct block_list *l2)
+{
+	return l2->num - l1->num;
+}
+
+static void add_list(char *buf, int len)
+{
+	if (list_size != 0 &&
+	    len == list[list_size-1].len &&
+	    memcmp(buf, list[list_size-1].txt, len) == 0) {
+		list[list_size-1].num++;
+		return;
+	}
+	if (list_size == max_size) {
+		printf("max_size too small??\n");
+		exit(1);
+	}
+	list[list_size].txt = malloc(len+1);
+	list[list_size].len = len;
+	list[list_size].num = 1;
+	memcpy(list[list_size].txt, buf, len);
+	list[list_size].txt[len] = 0;
+	list_size++;
+	if (list_size % 1000 == 0) {
+		printf("loaded %d\r", list_size);
+		fflush(stdout);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	FILE *fin, *fout;
+	char buf[1024];
+	int ret, i, count;
+	struct block_list *list2;
+	struct stat st;
+
+	fin = fopen(argv[1], "r");
+	fout = fopen(argv[2], "w");
+	if (!fin || !fout) {
+		printf("Usage: ./program <input> <output>\n");
+		perror("open: ");
+		exit(2);
+	}
+
+	fstat(fileno(fin), &st);
+	max_size = st.st_size / 100; /* hack ... */
+
+	list = malloc(max_size * sizeof(*list));
+
+	for(;;) {
+		ret = read_block(buf, fin);
+		if (ret < 0)
+			break;
+
+		buf[ret] = '\0';
+		add_list(buf, ret);
+	}
+
+	printf("loaded %d\n", list_size);
+
+	printf("sorting ....\n");
+
+	qsort(list, list_size, sizeof(list[0]), compare_txt);
+
+	list2 = malloc(sizeof(*list) * list_size);
+
+	printf("culling\n");
+
+	for (i=count=0;i<list_size;i++) {
+		if (count == 0 ||
+		    strcmp(list2[count-1].txt, list[i].txt) != 0) {
+			list2[count++] = list[i];
+		} else {
+			list2[count-1].num += list[i].num;
+		}
+	}
+
+	qsort(list2, count, sizeof(list[0]), compare_num);
+
+	for (i=0;i<count;i++) {
+		fprintf(fout, "%d times:\n%s\n", list2[i].num, list2[i].txt);
+	}
+	return 0;
+}
diff -puN include/linux/mm_types.h~pageowner include/linux/mm_types.h
--- linux-2.6.git/include/linux/mm_types.h~pageowner	2012-12-07 16:22:43.856270620 -0500
+++ linux-2.6.git-dave/include/linux/mm_types.h	2012-12-07 16:22:43.872270758 -0500
@@ -12,6 +12,7 @@
 #include <linux/cpumask.h>
 #include <linux/page-debug-flags.h>
 #include <linux/uprobes.h>
+#include <linux/stacktrace.h>
 #include <asm/page.h>
 #include <asm/mmu.h>
 
@@ -175,6 +176,12 @@ struct page {
 	 */
 	void *shadow;
 #endif
+#ifdef CONFIG_PAGE_OWNER
+	int order;
+	unsigned int gfp_mask;
+	struct stack_trace trace;
+	unsigned long trace_entries[8];
+#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff -puN include/linux/stacktrace.h~pageowner include/linux/stacktrace.h
--- linux-2.6.git/include/linux/stacktrace.h~pageowner	2012-12-07 16:22:43.856270620 -0500
+++ linux-2.6.git-dave/include/linux/stacktrace.h	2012-12-07 16:22:43.872270758 -0500
@@ -20,6 +20,8 @@ extern void save_stack_trace_tsk(struct
 				struct stack_trace *trace);
 
 extern void print_stack_trace(struct stack_trace *trace, int spaces);
+extern int  snprint_stack_trace(char *buf, int buf_len,
+				struct stack_trace *trace, int spaces);
 
 #ifdef CONFIG_USER_STACKTRACE_SUPPORT
 extern void save_stack_trace_user(struct stack_trace *trace);
@@ -32,6 +34,7 @@ extern void save_stack_trace_user(struct
 # define save_stack_trace_tsk(tsk, trace)		do { } while (0)
 # define save_stack_trace_user(trace)			do { } while (0)
 # define print_stack_trace(trace, spaces)		do { } while (0)
+# define snprint_stack_trace(buf, len, trace, spaces)	do { } while (0)
 #endif
 
 #endif
diff -puN kernel/stacktrace.c~pageowner kernel/stacktrace.c
--- linux-2.6.git/kernel/stacktrace.c~pageowner	2012-12-07 16:22:43.860270654 -0500
+++ linux-2.6.git-dave/kernel/stacktrace.c	2012-12-07 16:22:43.876270793 -0500
@@ -11,6 +11,29 @@
 #include <linux/kallsyms.h>
 #include <linux/stacktrace.h>
 
+int snprint_stack_trace(char *buf, int buf_len, struct stack_trace *trace,
+			int spaces)
+{
+	int ret = 0;
+	int i;
+
+	if (WARN_ON(!trace->entries))
+		return 0;
+
+	for (i = 0; i < trace->nr_entries; i++) {
+		unsigned long ip = trace->entries[i];
+		int printed = snprintf(buf, buf_len, "%*c[<%p>] %pS\n",
+				1 + spaces, ' ',
+				(void *) ip, (void *) ip);
+		buf_len -= printed;
+		ret += printed;
+		buf += printed;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(snprint_stack_trace);
+
 void print_stack_trace(struct stack_trace *trace, int spaces)
 {
 	int i;
diff -puN lib/Kconfig.debug~pageowner lib/Kconfig.debug
--- linux-2.6.git/lib/Kconfig.debug~pageowner	2012-12-07 16:22:43.860270654 -0500
+++ linux-2.6.git-dave/lib/Kconfig.debug	2012-12-07 16:22:43.876270793 -0500
@@ -99,6 +99,18 @@ config UNUSED_SYMBOLS
 	  you really need it, and what the merge plan to the mainline kernel for
 	  your module is.
 
+config PAGE_OWNER
+	bool "Track page owner"
+	depends on DEBUG_KERNEL
+	select DEBUG_FS
+	select STACKTRACE
+	help
+	  This keeps track of what call chain is the owner of a page, may
+	  help to find bare alloc_page(s) leaks. Eats a fair amount of memory.
+	  See Documentation/page_owner.c for user-space helper.
+
+	  If unsure, say N.
+
 config DEBUG_FS
 	bool "Debug Filesystem"
 	help
diff -puN mm/Makefile~pageowner mm/Makefile
--- linux-2.6.git/mm/Makefile~pageowner	2012-12-07 16:22:43.864270689 -0500
+++ linux-2.6.git-dave/mm/Makefile	2012-12-07 16:22:43.876270793 -0500
@@ -57,3 +57,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak
 obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
 obj-$(CONFIG_CLEANCACHE) += cleancache.o
 obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
+obj-$(CONFIG_PAGE_OWNER) += pageowner.o
diff -puN mm/page_alloc.c~pageowner mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~pageowner	2012-12-07 16:22:43.868270724 -0500
+++ linux-2.6.git-dave/mm/page_alloc.c	2012-12-07 16:22:43.876270793 -0500
@@ -437,6 +437,9 @@ static inline void set_page_order(struct
 {
 	set_page_private(page, order);
 	__SetPageBuddy(page);
+#ifdef CONFIG_PAGE_OWNER
+	page->order = -1;
+#endif
 }
 
 static inline void rmv_page_order(struct page *page)
@@ -2250,6 +2253,22 @@ __perform_reclaim(gfp_t gfp_mask, unsign
 	return progress;
 }
 
+static void set_page_owner(struct page *page, unsigned int order,
+			unsigned int gfp_mask)
+{
+#ifdef CONFIG_PAGE_OWNER
+	struct stack_trace *trace = &page->trace;
+	trace->nr_entries = 0;
+	trace->max_entries = ARRAY_SIZE(page->trace_entries);
+	trace->entries = &page->trace_entries[0];
+	trace->skip = 3;
+	save_stack_trace(&page->trace);
+
+	page->order = (int) order;
+	page->gfp_mask = gfp_mask;
+#endif /* CONFIG_PAGE_OWNER */
+}
+
 /* The really slow allocator path where we enter direct reclaim */
 static inline struct page *
 __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
@@ -2285,6 +2304,8 @@ retry:
 		goto retry;
 	}
 
+	if (page)
+		set_page_owner(page, order, gfp_mask);
 	return page;
 }
 
@@ -2593,6 +2614,8 @@ nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 	return page;
 got_pg:
+	if (page)
+		set_page_owner(page, order, gfp_mask);
 	if (kmemcheck_enabled)
 		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
 
@@ -2665,6 +2688,9 @@ out:
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
 		goto retry_cpuset;
 
+	if (page)
+		set_page_owner(page, order, gfp_mask);
+
 	return page;
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
@@ -3869,6 +3895,9 @@ void __meminit memmap_init_zone(unsigned
 		if (!is_highmem_idx(zone))
 			set_page_address(page, __va(pfn << PAGE_SHIFT));
 #endif
+#ifdef CONFIG_PAGE_OWNER
+		page->order = -1;
+#endif
 	}
 }
 
diff -puN /dev/null mm/pageowner.c
--- /dev/null	2012-06-13 15:09:09.708529931 -0400
+++ linux-2.6.git-dave/mm/pageowner.c	2012-12-07 16:22:43.876270793 -0500
@@ -0,0 +1,163 @@
+#include <linux/debugfs.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/huge_mm.h>
+#include <linux/mount.h>
+#include <linux/seq_file.h>
+#include <linux/highmem.h>
+#include <linux/ptrace.h>
+#include <linux/slab.h>
+#include <linux/pagemap.h>
+#include <linux/mempolicy.h>
+#include <linux/rmap.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+
+#include <asm/elf.h>
+#include <asm/uaccess.h>
+#include <asm/tlbflush.h>
+#include "internal.h"
+
+#include <linux/bootmem.h>
+#include <linux/kallsyms.h>
+
+static ssize_t
+read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+{
+	unsigned long pfn;
+	struct page *page;
+	char *kbuf;
+	int ret = 0;
+	ssize_t num_written = 0;
+	int blocktype = 0, pagetype = 0;
+
+	page = NULL;
+	pfn = min_low_pfn + *ppos;
+
+	/* Find a valid PFN or the start of a MAX_ORDER_NR_PAGES area */
+	while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
+		pfn++;
+
+	//printk("pfn: %ld max_pfn: %ld\n", pfn, max_pfn);
+	/* Find an allocated page */
+	for (; pfn < max_pfn; pfn++) {
+		/*
+		 * If the new page is in a new MAX_ORDER_NR_PAGES area,
+		 * validate the area as existing, skip it if not
+		 */
+		if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn)) {
+			pfn += MAX_ORDER_NR_PAGES - 1;
+			continue;
+		}
+
+		/* Check for holes within a MAX_ORDER area */
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+
+		/* Catch situations where free pages have a bad ->order  */
+		if (page->order >= 0 && PageBuddy(page))
+			printk(KERN_WARNING
+				"PageOwner info inaccurate for PFN %lu\n",
+				pfn);
+
+		/* Stop search if page is allocated and has trace info */
+		if (page->order >= 0 && page->trace.nr_entries) {
+			//intk("stopped search at pfn: %ld\n", pfn);
+			break;
+		}
+	}
+
+	if (!pfn_valid(pfn))
+		return 0;
+	/*
+	 * If memory does not end at a SECTION_SIZE boundary, then
+	 * we might have a pfn_valid() above max_pfn
+	 */
+	if (pfn >= max_pfn)
+		return 0;
+
+	/* Record the next PFN to read in the file offset */
+	*ppos = (pfn - min_low_pfn) + 1;
+
+	kbuf = kmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	//printk("page: %p\n", page);
+	ret = snprintf(kbuf, count, "Page allocated via order %d, mask 0x%x\n",
+			page->order, page->gfp_mask);
+	if (ret >= count) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Print information relevant to grouping pages by mobility */
+	blocktype = get_pageblock_migratetype(page);
+	pagetype  = allocflags_to_migratetype(page->gfp_mask);
+	ret += snprintf(kbuf+ret, count-ret,
+			"PFN %lu Block %lu type %d %s "
+			"Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
+			pfn,
+			pfn >> pageblock_order,
+			blocktype,
+			blocktype != pagetype ? "Fallback" : "        ",
+			PageLocked(page)	? "K" : " ",
+			PageError(page)		? "E" : " ",
+			PageReferenced(page)	? "R" : " ",
+			PageUptodate(page)	? "U" : " ",
+			PageDirty(page)		? "D" : " ",
+			PageLRU(page)		? "L" : " ",
+			PageActive(page)	? "A" : " ",
+			PageSlab(page)		? "S" : " ",
+			PageWriteback(page)	? "W" : " ",
+			PageCompound(page)	? "C" : " ",
+			PageSwapCache(page)	? "B" : " ",
+			PageMappedToDisk(page)	? "M" : " ");
+	if (ret >= count) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	num_written = ret;
+
+	ret = snprint_stack_trace(kbuf + num_written, count - num_written,
+				  &page->trace, 0);
+	if (ret >= count - num_written) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	num_written += ret;
+
+	ret = snprintf(kbuf + num_written, count - num_written, "\n");
+	if (ret >= count - num_written) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	num_written += ret;
+	ret = num_written;
+
+	if (copy_to_user(buf, kbuf, ret))
+		ret = -EFAULT;
+out:
+	kfree(kbuf);
+	return ret;
+}
+
+static struct file_operations proc_page_owner_operations = {
+	.read		= read_page_owner,
+};
+
+static int __init pageowner_init(void)
+{
+	struct dentry *dentry;
+
+	dentry = debugfs_create_file("page_owner", S_IRUSR, NULL,
+			NULL, &proc_page_owner_operations);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+	return 0;
+}
+module_init(pageowner_init)
diff -puN mm/vmstat.c~pageowner mm/vmstat.c
--- linux-2.6.git/mm/vmstat.c~pageowner	2012-12-07 16:22:43.868270724 -0500
+++ linux-2.6.git-dave/mm/vmstat.c	2012-12-07 16:22:43.880270828 -0500
@@ -19,6 +19,7 @@
 #include <linux/math64.h>
 #include <linux/writeback.h>
 #include <linux/compaction.h>
+#include "internal.h"
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -921,6 +922,100 @@ static int pagetypeinfo_showblockcount(s
 	return 0;
 }
 
+#ifdef CONFIG_PAGE_OWNER
+static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
+							pg_data_t *pgdat,
+							struct zone *zone)
+{
+	int mtype, pagetype;
+	unsigned long pfn;
+	unsigned long start_pfn = zone->zone_start_pfn;
+	unsigned long end_pfn = start_pfn + zone->spanned_pages;
+	unsigned long count[MIGRATE_TYPES] = { 0, };
+
+	/* Align PFNs to pageblock_nr_pages boundary */
+	pfn = start_pfn & ~(pageblock_nr_pages-1);
+
+	/*
+	 * Walk the zone in pageblock_nr_pages steps. If a page block spans
+	 * a zone boundary, it will be double counted between zones. This does
+	 * not matter as the mixed block count will still be correct
+	 */
+	for (; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		struct page *page;
+		unsigned long offset = 0;
+
+		/* Do not read before the zone start, use a valid page */
+		if (pfn < start_pfn)
+			offset = start_pfn - pfn;
+
+		if (!pfn_valid(pfn + offset))
+			continue;
+
+		page = pfn_to_page(pfn + offset);
+		mtype = get_pageblock_migratetype(page);
+
+		/* Check the block for bad migrate types */
+		for (; offset < pageblock_nr_pages; offset++) {
+			/* Do not past the end of the zone */
+			if (pfn + offset >= end_pfn)
+				break;
+
+			if (!pfn_valid_within(pfn + offset))
+				continue;
+
+			page = pfn_to_page(pfn + offset);
+
+			/* Skip free pages */
+			if (PageBuddy(page)) {
+				offset += (1UL << page_order(page)) - 1UL;
+				continue;
+			}
+			if (page->order < 0)
+				continue;
+
+			pagetype = allocflags_to_migratetype(page->gfp_mask);
+			if (pagetype != mtype) {
+				if (is_migrate_cma(pagetype))
+					count[MIGRATE_MOVABLE]++;
+				else
+					count[mtype]++;
+				break;
+			}
+
+			/* Move to end of this allocation */
+			offset += (1 << page->order) - 1;
+		}
+	}
+
+	/* Print counts */
+	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12lu ", count[mtype]);
+	seq_putc(m, '\n');
+}
+#endif /* CONFIG_PAGE_OWNER */
+
+/*
+ * Print out the number of pageblocks for each migratetype that contain pages
+ * of other types. This gives an indication of how well fallbacks are being
+ * contained by rmqueue_fallback(). It requires information from PAGE_OWNER
+ * to determine what is going on
+ */
+static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
+{
+#ifdef CONFIG_PAGE_OWNER
+	int mtype;
+
+	seq_printf(m, "\n%-23s", "Number of mixed blocks ");
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12s ", migratetype_names[mtype]);
+	seq_putc(m, '\n');
+
+	walk_zones_in_node(m, pgdat, pagetypeinfo_showmixedcount_print);
+#endif /* CONFIG_PAGE_OWNER */
+}
+
 /*
  * This prints out statistics in relation to grouping pages by mobility.
  * It is expensive to collect so do not constantly read the file.
@@ -938,6 +1033,7 @@ static int pagetypeinfo_show(struct seq_
 	seq_putc(m, '\n');
 	pagetypeinfo_showfree(m, pgdat);
 	pagetypeinfo_showblockcount(m, pgdat);
+	pagetypeinfo_showmixedcount(m, pgdat);
 
 	return 0;
 }
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-07 21:24 [PATCH] Debugging: Keep track of page owners Dave Hansen
@ 2012-12-07 22:26 ` Andrew Morton
  2012-12-07 22:37   ` Dave Hansen
  2012-12-08  2:51   ` Steven Rostedt
  2012-12-08  2:58 ` Steven Rostedt
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2012-12-07 22:26 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel

On Fri, 07 Dec 2012 16:24:17 -0500
Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> To: akpm@osdl.org

It's years since I was called that.

> From: mel@skynet.ie (Mel Gorman)

And him that.


I have cunningly divined the intention of your update and have queued
the below incremental.  The change to
pagetypeinfo_showmixedcount_print() was a surprise.  What's that there
for?




From: Dave Hansen <dave@linux.vnet.ibm.com>
Subject: debugging-keep-track-of-page-owners-fix

Use linux/stacktrace.h rather than hand-coding the stack tracing

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm_types.h   |    4 +-
 include/linux/stacktrace.h |    3 +
 kernel/stacktrace.c        |   23 ++++++++++++
 lib/Kconfig.debug          |    1 
 mm/page_alloc.c            |   65 ++++-------------------------------
 mm/pageowner.c             |   27 ++++----------
 mm/vmstat.c                |    5 ++
 7 files changed, 51 insertions(+), 77 deletions(-)

diff -puN include/linux/mm_types.h~debugging-keep-track-of-page-owners-fix include/linux/mm_types.h
--- a/include/linux/mm_types.h~debugging-keep-track-of-page-owners-fix
+++ a/include/linux/mm_types.h
@@ -8,6 +8,7 @@
 #include <linux/spinlock.h>
 #include <linux/rbtree.h>
 #include <linux/rwsem.h>
+#include <linux/stacktrace.h>
 #include <linux/completion.h>
 #include <linux/cpumask.h>
 #include <linux/page-debug-flags.h>
@@ -180,7 +181,8 @@ struct page {
 #ifdef CONFIG_PAGE_OWNER
 	int order;
 	unsigned int gfp_mask;
-	unsigned long trace[8];
+	struct stack_trace trace;
+	unsigned long trace_entries[8];
 #endif
 }
 /*
diff -puN include/linux/stacktrace.h~debugging-keep-track-of-page-owners-fix include/linux/stacktrace.h
--- a/include/linux/stacktrace.h~debugging-keep-track-of-page-owners-fix
+++ a/include/linux/stacktrace.h
@@ -20,6 +20,8 @@ extern void save_stack_trace_tsk(struct 
 				struct stack_trace *trace);
 
 extern void print_stack_trace(struct stack_trace *trace, int spaces);
+extern int  snprint_stack_trace(char *buf, int buf_len,
+				struct stack_trace *trace, int spaces);
 
 #ifdef CONFIG_USER_STACKTRACE_SUPPORT
 extern void save_stack_trace_user(struct stack_trace *trace);
@@ -32,6 +34,7 @@ extern void save_stack_trace_user(struct
 # define save_stack_trace_tsk(tsk, trace)		do { } while (0)
 # define save_stack_trace_user(trace)			do { } while (0)
 # define print_stack_trace(trace, spaces)		do { } while (0)
+# define snprint_stack_trace(buf, len, trace, spaces)	do { } while (0)
 #endif
 
 #endif
diff -puN kernel/stacktrace.c~debugging-keep-track-of-page-owners-fix kernel/stacktrace.c
--- a/kernel/stacktrace.c~debugging-keep-track-of-page-owners-fix
+++ a/kernel/stacktrace.c
@@ -11,6 +11,29 @@
 #include <linux/kallsyms.h>
 #include <linux/stacktrace.h>
 
+int snprint_stack_trace(char *buf, int buf_len, struct stack_trace *trace,
+			int spaces)
+{
+	int ret = 0;
+	int i;
+
+	if (WARN_ON(!trace->entries))
+		return 0;
+
+	for (i = 0; i < trace->nr_entries; i++) {
+		unsigned long ip = trace->entries[i];
+		int printed = snprintf(buf, buf_len, "%*c[<%p>] %pS\n",
+				1 + spaces, ' ',
+				(void *) ip, (void *) ip);
+		buf_len -= printed;
+		ret += printed;
+		buf += printed;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(snprint_stack_trace);
+
 void print_stack_trace(struct stack_trace *trace, int spaces)
 {
 	int i;
diff -puN lib/Kconfig.debug~debugging-keep-track-of-page-owners-fix lib/Kconfig.debug
--- a/lib/Kconfig.debug~debugging-keep-track-of-page-owners-fix
+++ a/lib/Kconfig.debug
@@ -103,6 +103,7 @@ config PAGE_OWNER
 	bool "Track page owner"
 	depends on DEBUG_KERNEL
 	select DEBUG_FS
+	select STACKTRACE
 	help
 	  This keeps track of what call chain is the owner of a page, may
 	  help to find bare alloc_page(s) leaks. Eats a fair amount of memory.
diff -puN mm/page_alloc.c~debugging-keep-track-of-page-owners-fix mm/page_alloc.c
--- a/mm/page_alloc.c~debugging-keep-track-of-page-owners-fix
+++ a/mm/page_alloc.c
@@ -2267,62 +2267,21 @@ __perform_reclaim(gfp_t gfp_mask, unsign
 	return progress;
 }
 
-#ifdef CONFIG_PAGE_OWNER
-static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
-{
-	return	p > (void *)tinfo &&
-		p < (void *)tinfo + THREAD_SIZE - 3;
-}
-
-static inline void __stack_trace(struct page *page, unsigned long *stack,
-			unsigned long bp)
-{
-	int i = 0;
-	unsigned long addr;
-	struct thread_info *tinfo = (struct thread_info *)
-		((unsigned long)stack & (~(THREAD_SIZE - 1)));
-
-	memset(page->trace, 0, sizeof(long) * 8);
-
-#ifdef CONFIG_FRAME_POINTER
-	if (bp) {
-		while (valid_stack_ptr(tinfo, (void *)bp)) {
-			addr = *(unsigned long *)(bp + sizeof(long));
-			page->trace[i] = addr;
-			if (++i >= 8)
-				break;
-			bp = *(unsigned long *)bp;
-		}
-		return;
-	}
-#endif /* CONFIG_FRAME_POINTER */
-	while (valid_stack_ptr(tinfo, stack)) {
-		addr = *stack++;
-		if (__kernel_text_address(addr)) {
-			page->trace[i] = addr;
-			if (++i >= 8)
-				break;
-		}
-	}
-}
-
 static void set_page_owner(struct page *page, unsigned int order,
 			unsigned int gfp_mask)
 {
-	unsigned long address;
-	unsigned long bp = 0;
-#ifdef CONFIG_X86_64
-	asm ("movq %%rbp, %0" : "=r" (bp) : );
-#endif
-#ifdef CONFIG_X86_32
-	asm ("movl %%ebp, %0" : "=r" (bp) : );
-#endif
+#ifdef CONFIG_PAGE_OWNER
+	struct stack_trace *trace = &page->trace;
+	trace->nr_entries = 0;
+	trace->max_entries = ARRAY_SIZE(page->trace_entries);
+	trace->entries = &page->trace_entries[0];
+	trace->skip = 3;
+	save_stack_trace(&page->trace);
+
 	page->order = (int) order;
 	page->gfp_mask = gfp_mask;
-	__stack_trace(page, &address, bp);
-}
 #endif /* CONFIG_PAGE_OWNER */
-
+}
 
 /* The really slow allocator path where we enter direct reclaim */
 static inline struct page *
@@ -2359,10 +2318,8 @@ retry:
 		goto retry;
 	}
 
-#ifdef CONFIG_PAGE_OWNER
 	if (page)
 		set_page_owner(page, order, gfp_mask);
-#endif
 	return page;
 }
 
@@ -2672,10 +2629,8 @@ nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 	return page;
 got_pg:
-#ifdef CONFIG_PAGE_OWNER
 	if (page)
 		set_page_owner(page, order, gfp_mask);
-#endif
 	if (kmemcheck_enabled)
 		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
 
@@ -2758,10 +2713,8 @@ out:
 
 	memcg_kmem_commit_charge(page, memcg, order);
 
-#ifdef CONFIG_PAGE_OWNER
 	if (page)
 		set_page_owner(page, order, gfp_mask);
-#endif
 
 	return page;
 }
diff -puN mm/pageowner.c~debugging-keep-track-of-page-owners-fix mm/pageowner.c
--- a/mm/pageowner.c~debugging-keep-track-of-page-owners-fix
+++ a/mm/pageowner.c
@@ -26,12 +26,8 @@ read_page_owner(struct file *file, char 
 {
 	unsigned long pfn;
 	struct page *page;
-	char *kbuf, *modname;
-	const char *symname;
+	char *kbuf;
 	int ret = 0;
-	char namebuf[128];
-	unsigned long offset = 0, symsize;
-	int i;
 	ssize_t num_written = 0;
 	int blocktype = 0, pagetype = 0;
 
@@ -67,7 +63,7 @@ read_page_owner(struct file *file, char 
 				pfn);
 
 		/* Stop search if page is allocated and has trace info */
-		if (page->order >= 0 && page->trace[0]) {
+		if (page->order >= 0 && page->trace.nr_entries) {
 			//intk("stopped search at pfn: %ld\n", pfn);
 			break;
 		}
@@ -126,20 +122,13 @@ read_page_owner(struct file *file, char 
 
 	num_written = ret;
 
-	for (i = 0; i < 8; i++) {
-		if (!page->trace[i])
-			break;
-		symname = kallsyms_lookup(page->trace[i], &symsize, &offset,
-					&modname, namebuf);
-		ret = snprintf(kbuf + num_written, count - num_written,
-				"[0x%lx] %s+%lu\n",
-				page->trace[i], namebuf, offset);
-		if (ret >= count - num_written) {
-			ret = -ENOMEM;
-			goto out;
-		}
-		num_written += ret;
+	ret = snprint_stack_trace(kbuf + num_written, count - num_written,
+				  &page->trace, 0);
+	if (ret >= count - num_written) {
+		ret = -ENOMEM;
+		goto out;
 	}
+	num_written += ret;
 
 	ret = snprintf(kbuf + num_written, count - num_written, "\n");
 	if (ret >= count - num_written) {
diff -puN mm/vmstat.c~debugging-keep-track-of-page-owners-fix mm/vmstat.c
--- a/mm/vmstat.c~debugging-keep-track-of-page-owners-fix
+++ a/mm/vmstat.c
@@ -985,7 +985,10 @@ static void pagetypeinfo_showmixedcount_
 
 			pagetype = allocflags_to_migratetype(page->gfp_mask);
 			if (pagetype != mtype) {
-				count[mtype]++;
+				if (is_migrate_cma(pagetype))
+					count[MIGRATE_MOVABLE]++;
+				else
+					count[mtype]++;
 				break;
 			}
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-07 22:26 ` Andrew Morton
@ 2012-12-07 22:37   ` Dave Hansen
  2012-12-07 22:44     ` Andrew Morton
  2012-12-08  2:51   ` Steven Rostedt
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2012-12-07 22:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Mel Gorman

On 12/07/2012 02:26 PM, Andrew Morton wrote:\
> I have cunningly divined the intention of your update and have queued
> the below incremental.  The change to
> pagetypeinfo_showmixedcount_print() was a surprise.  What's that there
> for?

Do you mean to ask why it's being modified at all here in this patch?
It's referenced in the changelog a bit.  I believe it came from Mel at
some point.  I didn't do much to that portion, but I happily drug those
hunks along with my forward port.  I believe it's virtually all the same
as what you posted here:

	https://bugzilla.kernel.org/show_bug.cgi?id=50181


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-07 22:37   ` Dave Hansen
@ 2012-12-07 22:44     ` Andrew Morton
  2012-12-07 22:58       ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2012-12-07 22:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, Mel Gorman

On Fri, 07 Dec 2012 14:37:27 -0800
Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On 12/07/2012 02:26 PM, Andrew Morton wrote:\
> > I have cunningly divined the intention of your update and have queued
> > the below incremental.  The change to
> > pagetypeinfo_showmixedcount_print() was a surprise.  What's that there
> > for?
> 
> Do you mean to ask why it's being modified at all here in this patch?
> It's referenced in the changelog a bit.  I believe it came from Mel at
> some point.  I didn't do much to that portion, but I happily drug those
> hunks along with my forward port.  I believe it's virtually all the same
> as what you posted here:
> 
> 	https://bugzilla.kernel.org/show_bug.cgi?id=50181

I'm now deeply confused.

Two days ago I merged a version which you sent me.  It is presently at
http://ozlabs.org/~akpm/mmotm/broken-out/debugging-keep-track-of-page-owners.patch.

Today you sent a new version.  My earlier email showed the difference
between these two versions.

AFACIT that difference was undescribed.  I can see that the new version
uses the stack-tracing infrastructure, but the change to
pagetypeinfo_showmixedcount_print() is a mystery.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-07 22:44     ` Andrew Morton
@ 2012-12-07 22:58       ` Dave Hansen
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2012-12-07 22:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Mel Gorman

On 12/07/2012 02:44 PM, Andrew Morton wrote:
> AFACIT that difference was undescribed.  I can see that the new version
> uses the stack-tracing infrastructure, but the change to
> pagetypeinfo_showmixedcount_print() is a mystery.

Ahhh, I assume you're talking about this hunk:

>> @@ -976,10 +976,7 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
>>  
>>                         pagetype = allocflags_to_migratetype(page->gfp_mask);
>>                         if (pagetype != mtype) {
>> -                               if (is_migrate_cma(pagetype))
>> -                                       count[MIGRATE_MOVABLE]++;
>> -                               else
>> -                                       count[mtype]++;
>> +                               count[mtype]++;
>>                                 break;
>>                         }

That was to fix the comment that Laura Abbott made about it miscounting
MIGRATE_CMA pages.

My patch-sending scripts were choking a bit on the text description in
your patch.  I'm using a long-ago-forked copy of your patch-utils and
the DESC/EDESC in the patch I imported is giving them fits when I send
via email and stripping large parts of the description.  I'm happy to
resend via email, too, but here, the raw patch (will the full description):

	https://www.sr71.net/~dave/linux/pageowner.patch

The important description that the scripts managed to strip out when
emailed was this:

Updated 12/4/2012 - should apply to 3.7 kernels.  I did a quick
sniff-test to make sure that this boots and produces some sane
output, but it's not been exhaustively tested.

 * Moved file over to debugfs (no reason to keep polluting /proc)
 * Now using generic stack tracking infrastructure
 * Added check for MIGRATE_CMA pages to explicitly count them
   as movable.

The new snprint_stack_trace() probably belongs in its own patch
if this were to get merged, but it won't kill anyone as it stands.

-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-07 22:26 ` Andrew Morton
  2012-12-07 22:37   ` Dave Hansen
@ 2012-12-08  2:51   ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2012-12-08  2:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Hansen, linux-mm, linux-kernel

On Fri, Dec 07, 2012 at 02:26:14PM -0800, Andrew Morton wrote:
> On Fri, 07 Dec 2012 16:24:17 -0500
> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> 
> > To: akpm@osdl.org
> 
> It's years since I was called that.

"Help me akpm@osdl.org. You're my only hope"...

> 
> > From: mel@skynet.ie (Mel Gorman)
> 
> And him that.
> 

"mel@skynet.ie is my father!" - Luke Skynet.ie

-- Steve ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Debugging: Keep track of page owners
  2012-12-07 21:24 [PATCH] Debugging: Keep track of page owners Dave Hansen
  2012-12-07 22:26 ` Andrew Morton
@ 2012-12-08  2:58 ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2012-12-08  2:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: akpm, linux-mm, linux-kernel

On Fri, Dec 07, 2012 at 04:24:17PM -0500, Dave Hansen wrote:
> 
> diff -puN /dev/null Documentation/page_owner.c

Can we stop putting code into Documentation? We have tools, samples and
usr directories. I'm sure this could fit into one of them.

-- Steve

> --- /dev/null	2012-06-13 15:09:09.708529931 -0400
> +++ linux-2.6.git-dave/Documentation/page_owner.c	2012-12-07 16:22:43.872270758 -0500
> @@ -0,0 +1,141 @@
> +/*
> + * User-space helper to sort the output of /sys/kernel/debug/page_owner
> + *
> + * Example use:
> + * cat /sys/kernel/debug/page_owner > page_owner_full.txt
> + * grep -v ^PFN page_owner_full.txt > page_owner.txt
> + * ./sort page_owner.txt sorted_page_owner.txt
> +*/
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <string.h>
> +
> +struct block_list {
> +	char *txt;
> +	int len;
> +	int num;
> +};
> +
> +
> +static struct block_list *list;
> +static int list_size;
> +static int max_size;
> +
> +struct block_list *block_head;
> +
> +int read_block(char *buf, FILE *fin)
> +{
> +	int ret = 0;
> +	int hit = 0;
> +	char *curr = buf;
> +
> +	for (;;) {
> +		*curr = getc(fin);
> +		if (*curr == EOF) return -1;
> +
> +		ret++;
> +		if (*curr == '\n' && hit == 1)
> +			return ret - 1;
> +		else if (*curr == '\n')
> +			hit = 1;
> +		else
> +			hit = 0;
> +		curr++;
> +	}
> +}
> +
> +static int compare_txt(struct block_list *l1, struct block_list *l2)
> +{
> +	return strcmp(l1->txt, l2->txt);
> +}
> +
> +static int compare_num(struct block_list *l1, struct block_list *l2)
> +{
> +	return l2->num - l1->num;
> +}
> +
> +static void add_list(char *buf, int len)
> +{
> +	if (list_size != 0 &&
> +	    len == list[list_size-1].len &&
> +	    memcmp(buf, list[list_size-1].txt, len) == 0) {
> +		list[list_size-1].num++;
> +		return;
> +	}
> +	if (list_size == max_size) {
> +		printf("max_size too small??\n");
> +		exit(1);
> +	}
> +	list[list_size].txt = malloc(len+1);
> +	list[list_size].len = len;
> +	list[list_size].num = 1;
> +	memcpy(list[list_size].txt, buf, len);
> +	list[list_size].txt[len] = 0;
> +	list_size++;
> +	if (list_size % 1000 == 0) {
> +		printf("loaded %d\r", list_size);
> +		fflush(stdout);
> +	}
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	FILE *fin, *fout;
> +	char buf[1024];
> +	int ret, i, count;
> +	struct block_list *list2;
> +	struct stat st;
> +
> +	fin = fopen(argv[1], "r");
> +	fout = fopen(argv[2], "w");
> +	if (!fin || !fout) {
> +		printf("Usage: ./program <input> <output>\n");
> +		perror("open: ");
> +		exit(2);
> +	}
> +
> +	fstat(fileno(fin), &st);
> +	max_size = st.st_size / 100; /* hack ... */
> +
> +	list = malloc(max_size * sizeof(*list));
> +
> +	for(;;) {
> +		ret = read_block(buf, fin);
> +		if (ret < 0)
> +			break;
> +
> +		buf[ret] = '\0';
> +		add_list(buf, ret);
> +	}
> +
> +	printf("loaded %d\n", list_size);
> +
> +	printf("sorting ....\n");
> +
> +	qsort(list, list_size, sizeof(list[0]), compare_txt);
> +
> +	list2 = malloc(sizeof(*list) * list_size);
> +
> +	printf("culling\n");
> +
> +	for (i=count=0;i<list_size;i++) {
> +		if (count == 0 ||
> +		    strcmp(list2[count-1].txt, list[i].txt) != 0) {
> +			list2[count++] = list[i];
> +		} else {
> +			list2[count-1].num += list[i].num;
> +		}
> +	}
> +
> +	qsort(list2, count, sizeof(list[0]), compare_num);
> +
> +	for (i=0;i<count;i++) {
> +		fprintf(fout, "%d times:\n%s\n", list2[i].num, list2[i].txt);
> +	}
> +	return 0;
> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-12-08  2:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-07 21:24 [PATCH] Debugging: Keep track of page owners Dave Hansen
2012-12-07 22:26 ` Andrew Morton
2012-12-07 22:37   ` Dave Hansen
2012-12-07 22:44     ` Andrew Morton
2012-12-07 22:58       ` Dave Hansen
2012-12-08  2:51   ` Steven Rostedt
2012-12-08  2:58 ` Steven Rostedt
  -- strict thread matches above, loose matches on Subject: below --
2012-12-05  1:12 Dave Hansen
2012-12-05 15:01 ` Laura Abbott
2012-12-05 17:48   ` Dave Hansen
2012-12-05 22:36     ` Andrew Morton
2012-12-07 16:37     ` Laura Abbott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).