linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Larry Woodman <lwoodman@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
	"Ingo Molnar" <mingo@elte.hu>,
	"Fr馘駻ic Weisbecker" <fweisbec@gmail.com>,
	"Li Zefan" <lizf@cn.fujitsu.com>,
	"Pekka Enberg" <penberg@cs.helsinki.fi>,
	eduard.munteanu@linux360.ro, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rostedt@goodmis.org, lwoodman@redhat.com,
	"Linda Wang" <lwang@redhat.com>
Subject: Re: [Patch] mm tracepoints update - use case.
Date: Wed, 17 Jun 2009 10:07:01 -0400	[thread overview]
Message-ID: <4A38F885.8040009@redhat.com> (raw)
In-Reply-To: <4A36925D.4090000@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3462 bytes --]

Rik van Riel wrote:
>
> Sorry I am replying to a really old email, but exactly
> what information do you believe would be more useful to
> extract from vmscan.c with tracepoints?
>
> What are the kinds of problems that customer systems
> (which cannot be rebooted into experimental kernels)
> run into, that can be tracked down with tracepoints?
>
> I can think of a few:
> - excessive CPU use in page reclaim code
> - excessive reclaim latency in page reclaim code
> - unbalanced memory allocation between zones/nodes
> - strange balance problems between reclaiming of page
>   cache and swapping out process pages
>
> I suspect we would need fairly fine grained tracepoints
> to track down these kinds of problems, with filtering
> and/or interpretation in userspace, but I am always
> interested in easier ways of tracking down these kinds
> of problems :)
>
> What kinds of tracepoints do you believe we would need?
>
> Or, using Larry's patch as a starting point, what do you
> believe should be changed?
>

Rik, I know these mm tracepoint patches produce a low of output in the 
trace buffer.
In a nutshell what I have done is to add them in critical locations in 
places that allocate
memory, map that memory in user space, unmap it from user space, and 
free it.  In addition,
I have added tracepoints to important places in the memory allocation 
and reclaim paths so
we can see failures, stalls, high latencies as well as normal behavior.  
Finally I added them
to the pdflush operations so we can determine amounts of memory written 
back to disk there
versus the swapout paths.  Perhaps if this is too many tracepoints all 
at once we could focus
mainly on those specific to the page reclaim code path since that is 
where most contention
occurs?

Anonymous memory tracepoints:
1.) mm_anon_fault - initial anonymous pagefault.
2.) mm_anon_unmap - anonymous unmap triggered by page reclaim.
3.) mm_anon_userfree - anonymous memory unmap by user.
4.) mm_anon_cow - anonymous COW fault
5.) mm_anon_pgin - anonymous pagein from swap.

Filemap memory tracepoints:
1.) mm_filemap_fault - initial filemap fault.
2.) mm_filemap_cow - filemap COW fault.
3.) mm_filemap_userunmap - filemap unmap by user.
4.) mm_filemap_unmap - filemap unmap triggered by page reclaim.

Page allocation failure tracepoints:
1.) mm_page_allocation - page allocation that fails and causes page reclaim.

Page kswapd and direct reclaim tracepoints:
1.) mm_kswapd_ran - kswapd ran and tells us how many pages it reclaimed.
2.) mm_directreclaim_reclaimall - direct reclaim because free lists were 
below min.
3.) mm_directreclaim_reclaimzone - direct reclaim of a specific numa node.

Inner workings of the page reclaim tracepoints:
1.) mm_pagereclaim_shrinkzone - shrink zone, tells us how many pages 
were scanned.
2.) mm_pagereclaim_shrinkinactive - shrink inactive list, tells us how 
many pages were deactivated.
3.) mm_pagereclaim_shrinkactive - shrink inactive list, tells us how 
many pages were processed
4.) mm_pagereclaim_pgout - pageout, tells us which pages were paged out.
5.) mm_pagereclaim_free - tells us how many pages were freed in each 
page reclaim invocation.

Pagecache flushing tracepoints:
1.) mm_balance_dirty - tells us how many pages were written when dirty 
was above dirty_ratio.
2.) mm_pdflush_bgwriteout - tells us how many pages written when dirty 
was above dirty_background_ratio.
3.) mm_pdflush_kupdate - tells us how many pages kupdate wrote.



[-- Attachment #2: mmtracepoints-617.diff --]
[-- Type: text/plain, Size: 17240 bytes --]

diff --git a/include/trace/events/mm.h b/include/trace/events/mm.h
new file mode 100644
index 0000000..1d888a4
--- /dev/null
+++ b/include/trace/events/mm.h
@@ -0,0 +1,436 @@
+#if !defined(_TRACE_MM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MM_H
+
+#include <linux/mm.h>
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mm
+
+TRACE_EVENT(mm_anon_fault,
+
+	TP_PROTO(struct mm_struct *mm, unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+);
+
+TRACE_EVENT(mm_anon_pgin,
+
+	TP_PROTO(struct mm_struct *mm, unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_anon_cow,
+
+	TP_PROTO(struct mm_struct *mm,
+			unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_anon_userfree,
+
+	TP_PROTO(struct mm_struct *mm,
+			unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_anon_unmap,
+
+	TP_PROTO(struct mm_struct *mm, unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_filemap_fault,
+
+	TP_PROTO(struct mm_struct *mm, unsigned long address, int flag),
+	TP_ARGS(mm, address, flag),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+		__field(int, flag)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+		__entry->flag = flag;
+	),
+
+	TP_printk("%s: mm=%lx address=%lx",
+		__entry->flag ? "pagein" : "primary fault",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_filemap_cow,
+
+	TP_PROTO(struct mm_struct *mm, unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_filemap_unmap,
+
+	TP_PROTO(struct mm_struct *mm, unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_filemap_userunmap,
+
+	TP_PROTO(struct mm_struct *mm, unsigned long address),
+
+	TP_ARGS(mm, address),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned long, address)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->address = address;
+	),
+
+	TP_printk("mm=%lx address=%lx",
+		(unsigned long)__entry->mm, __entry->address)
+	);
+
+TRACE_EVENT(mm_pagereclaim_pgout,
+
+	TP_PROTO(struct address_space *mapping, unsigned long offset, int anon),
+
+	TP_ARGS(mapping, offset, anon),
+
+	TP_STRUCT__entry(
+		__field(struct address_space *, mapping)
+		__field(unsigned long, offset)
+		__field(int, anon)
+	),
+
+	TP_fast_assign(
+		__entry->mapping = mapping;
+		__entry->offset = offset;
+		__entry->anon = anon;
+	),
+
+	TP_printk("mapping=%lx, offset=%lx %s",
+		(unsigned long)__entry->mapping, __entry->offset, 
+			__entry->anon ? "anonymous" : "pagecache")
+	);
+
+TRACE_EVENT(mm_pagereclaim_free,
+
+	TP_PROTO(unsigned long nr_reclaimed),
+
+	TP_ARGS(nr_reclaimed),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, nr_reclaimed)
+	),
+
+	TP_fast_assign(
+		__entry->nr_reclaimed = nr_reclaimed;
+	),
+
+	TP_printk("freed=%ld", __entry->nr_reclaimed)
+	);
+
+TRACE_EVENT(mm_pdflush_bgwriteout,
+
+	TP_PROTO(unsigned long written),
+
+	TP_ARGS(written),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, written)
+	),
+
+	TP_fast_assign(
+		__entry->written = written;
+	),
+
+	TP_printk("written=%ld", __entry->written)
+	);
+
+TRACE_EVENT(mm_pdflush_kupdate,
+
+	TP_PROTO(unsigned long writes),
+
+	TP_ARGS(writes),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, writes)
+	),
+
+	TP_fast_assign(
+		__entry->writes = writes;
+	),
+
+	TP_printk("writes=%ld", __entry->writes)
+	);
+
+TRACE_EVENT(mm_balance_dirty,
+
+	TP_PROTO(unsigned long written),
+
+	TP_ARGS(written),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, written)
+	),
+
+	TP_fast_assign(
+		__entry->written = written;
+	),
+
+	TP_printk("written=%ld", __entry->written)
+	);
+
+TRACE_EVENT(mm_page_allocation,
+
+	TP_PROTO(unsigned long free),
+
+	TP_ARGS(free),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, free)
+	),
+
+	TP_fast_assign(
+		__entry->free = free;
+	),
+
+	TP_printk("zone_free=%ld", __entry->free)
+	);
+
+TRACE_EVENT(mm_kswapd_ran,
+
+	TP_PROTO(struct pglist_data *pgdat, unsigned long reclaimed),
+
+	TP_ARGS(pgdat, reclaimed),
+
+	TP_STRUCT__entry(
+		__field(struct pglist_data *, pgdat)
+		__field(int, node_id)
+		__field(unsigned long, reclaimed)
+	),
+
+	TP_fast_assign(
+		__entry->pgdat = pgdat;
+		__entry->node_id = pgdat->node_id;
+		__entry->reclaimed = reclaimed;
+	),
+
+	TP_printk("node=%d reclaimed=%ld", __entry->node_id, __entry->reclaimed)
+	);
+
+TRACE_EVENT(mm_directreclaim_reclaimall,
+
+	TP_PROTO(int node, unsigned long reclaimed, unsigned long priority),
+
+	TP_ARGS(node, reclaimed, priority),
+
+	TP_STRUCT__entry(
+		__field(int, node)
+		__field(unsigned long, reclaimed)
+		__field(unsigned long, priority)
+	),
+
+	TP_fast_assign(
+		__entry->node = node;
+		__entry->reclaimed = reclaimed;
+		__entry->priority = priority;
+	),
+
+	TP_printk("node=%d reclaimed=%ld priority=%ld", __entry->node, __entry->reclaimed, 
+					__entry->priority)
+	);
+
+TRACE_EVENT(mm_directreclaim_reclaimzone,
+
+	TP_PROTO(int node, unsigned long reclaimed, unsigned long priority),
+
+	TP_ARGS(node, reclaimed, priority),
+
+	TP_STRUCT__entry(
+		__field(int, node)
+		__field(unsigned long, reclaimed)
+		__field(unsigned long, priority)
+	),
+
+	TP_fast_assign(
+		__entry->node = node;
+		__entry->reclaimed = reclaimed;
+		__entry->priority = priority;
+	),
+
+	TP_printk("node = %d reclaimed=%ld, priority=%ld",
+			__entry->node, __entry->reclaimed, __entry->priority)
+	);
+TRACE_EVENT(mm_pagereclaim_shrinkzone,
+
+	TP_PROTO(unsigned long reclaimed, unsigned long priority),
+
+	TP_ARGS(reclaimed, priority),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, reclaimed)
+		__field(unsigned long, priority)
+	),
+
+	TP_fast_assign(
+		__entry->reclaimed = reclaimed;
+		__entry->priority = priority;
+	),
+
+	TP_printk("reclaimed=%ld priority=%ld",
+			__entry->reclaimed, __entry->priority)
+	);
+
+TRACE_EVENT(mm_pagereclaim_shrinkactive,
+
+	TP_PROTO(unsigned long scanned, int file, int priority),
+
+	TP_ARGS(scanned, file, priority),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, scanned)
+		__field(int, file)
+		__field(int, priority)
+	),
+
+	TP_fast_assign(
+		__entry->scanned = scanned;
+		__entry->file = file;
+		__entry->priority = priority;
+	),
+
+	TP_printk("scanned=%ld, %s, priority=%d",
+		__entry->scanned, __entry->file ? "pagecache" : "anonymous",
+		__entry->priority)
+	);
+
+TRACE_EVENT(mm_pagereclaim_shrinkinactive,
+
+	TP_PROTO(unsigned long scanned, unsigned long reclaimed,
+			int priority),
+
+	TP_ARGS(scanned, reclaimed, priority),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, scanned)
+		__field(unsigned long, reclaimed)
+		__field(int, priority)
+	),
+
+	TP_fast_assign(
+		__entry->scanned = scanned;
+		__entry->reclaimed = reclaimed;
+		__entry->priority = priority;
+	),
+
+	TP_printk("scanned=%ld, reclaimed=%ld, priority=%d",
+		__entry->scanned, __entry->reclaimed, 
+		__entry->priority)
+	);
+
+#endif /* _TRACE_MM_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/filemap.c b/mm/filemap.c
index 1b60f30..af4a964 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -34,6 +34,7 @@
 #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h> /* for page_is_file_cache() */
+#include <trace/events/mm.h>
 #include "internal.h"
 
 /*
@@ -1568,6 +1569,8 @@ retry_find:
 	 */
 	ra->prev_pos = (loff_t)page->index << PAGE_CACHE_SHIFT;
 	vmf->page = page;
+	trace_mm_filemap_fault(vma->vm_mm, (unsigned long)vmf->virtual_address,
+			vmf->flags&FAULT_FLAG_NONLINEAR);
 	return ret | VM_FAULT_LOCKED;
 
 no_cached_page:
diff --git a/mm/memory.c b/mm/memory.c
index 4126dd1..a4a580c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -61,6 +61,7 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 #include <asm/pgtable.h>
+#include <trace/events/mm.h>
 
 #include "internal.h"
 
@@ -812,15 +813,17 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 						addr) != page->index)
 				set_pte_at(mm, addr, pte,
 					   pgoff_to_pte(page->index));
-			if (PageAnon(page))
+			if (PageAnon(page)) {
 				anon_rss--;
-			else {
+				trace_mm_anon_userfree(mm, addr);
+			} else {
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent) &&
 				    likely(!VM_SequentialReadHint(vma)))
 					mark_page_accessed(page);
 				file_rss--;
+				trace_mm_filemap_userunmap(mm, addr);
 			}
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
@@ -1896,7 +1899,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		unsigned long address, pte_t *page_table, pmd_t *pmd,
 		spinlock_t *ptl, pte_t orig_pte)
 {
-	struct page *old_page, *new_page;
+	struct page *old_page, *new_page = NULL;
 	pte_t entry;
 	int reuse = 0, ret = 0;
 	int page_mkwrite = 0;
@@ -2050,9 +2053,12 @@ gotten:
 			if (!PageAnon(old_page)) {
 				dec_mm_counter(mm, file_rss);
 				inc_mm_counter(mm, anon_rss);
+				trace_mm_filemap_cow(mm, address);
 			}
-		} else
+		} else {
 			inc_mm_counter(mm, anon_rss);
+			trace_mm_anon_cow(mm, address);
+		}
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
@@ -2449,7 +2455,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		int write_access, pte_t orig_pte)
 {
 	spinlock_t *ptl;
-	struct page *page;
+	struct page *page = NULL;
 	swp_entry_t entry;
 	pte_t pte;
 	struct mem_cgroup *ptr = NULL;
@@ -2549,6 +2555,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 out:
+	trace_mm_anon_pgin(mm, address);
 	return ret;
 out_nomap:
 	mem_cgroup_cancel_charge_swapin(ptr);
@@ -2582,6 +2589,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		goto oom;
 	__SetPageUptodate(page);
 
+	trace_mm_anon_fault(mm, address);
 	if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))
 		goto oom_free_page;
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index bb553c3..ef92a97 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -34,6 +34,7 @@
 #include <linux/syscalls.h>
 #include <linux/buffer_head.h>
 #include <linux/pagevec.h>
+#include <trace/events/mm.h>
 
 /*
  * The maximum number of pages to writeout in a single bdflush/kupdate
@@ -574,6 +575,7 @@ static void balance_dirty_pages(struct address_space *mapping)
 		congestion_wait(WRITE, HZ/10);
 	}
 
+	trace_mm_balance_dirty(pages_written);
 	if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
 			bdi->dirty_exceeded)
 		bdi->dirty_exceeded = 0;
@@ -716,6 +718,7 @@ static void background_writeout(unsigned long _min_pages)
 				break;
 		}
 	}
+	trace_mm_pdflush_bgwriteout(_min_pages);
 }
 
 /*
@@ -776,6 +779,7 @@ static void wb_kupdate(unsigned long arg)
 	nr_to_write = global_page_state(NR_FILE_DIRTY) +
 			global_page_state(NR_UNSTABLE_NFS) +
 			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
+	trace_mm_pdflush_kupdate(nr_to_write);
 	while (nr_to_write > 0) {
 		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0727896..ca9355e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -48,6 +48,7 @@
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
 #include <linux/kmemleak.h>
+#include <trace/events/mm.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1440,6 +1441,7 @@ zonelist_scan:
 				mark = zone->pages_high;
 			if (!zone_watermark_ok(zone, order, mark,
 				    classzone_idx, alloc_flags)) {
+				trace_mm_page_allocation(zone_page_state(zone, NR_FREE_PAGES));
 				if (!zone_reclaim_mode ||
 				    !zone_reclaim(zone, gfp_mask, order))
 					goto this_zone_full;
diff --git a/mm/rmap.c b/mm/rmap.c
index 23122af..f2156ca 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -50,6 +50,7 @@
 #include <linux/memcontrol.h>
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
+#include <trace/events/mm.h>
 
 #include <asm/tlbflush.h>
 
@@ -1025,6 +1026,7 @@ static int try_to_unmap_anon(struct page *page, int unlock, int migration)
 			if (mlocked)
 				break;	/* stop if actually mlocked page */
 		}
+		trace_mm_anon_unmap(vma->vm_mm, vma->vm_start+page->index);
 	}
 
 	page_unlock_anon_vma(anon_vma);
@@ -1152,6 +1154,7 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration)
 					goto out;
 			}
 			vma->vm_private_data = (void *) max_nl_cursor;
+			trace_mm_filemap_unmap(vma->vm_mm, vma->vm_start+page->index);
 		}
 		cond_resched_lock(&mapping->i_mmap_lock);
 		max_nl_cursor += CLUSTER_SIZE;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 95c08a8..bed7125 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -40,6 +40,8 @@
 #include <linux/memcontrol.h>
 #include <linux/delayacct.h>
 #include <linux/sysctl.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/mm.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -417,6 +419,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			ClearPageReclaim(page);
 		}
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
+		trace_mm_pagereclaim_pgout(mapping, page->index<<PAGE_SHIFT,
+						PageAnon(page));
 		return PAGE_SUCCESS;
 	}
 
@@ -796,6 +800,7 @@ keep:
 	if (pagevec_count(&freed_pvec))
 		__pagevec_free(&freed_pvec);
 	count_vm_events(PGACTIVATE, pgactivate);
+	trace_mm_pagereclaim_free(nr_reclaimed);
 	return nr_reclaimed;
 }
 
@@ -1182,6 +1187,8 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 done:
 	local_irq_enable();
 	pagevec_release(&pvec);
+	trace_mm_pagereclaim_shrinkinactive(nr_scanned, nr_reclaimed,
+				priority);
 	return nr_reclaimed;
 }
 
@@ -1316,6 +1323,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);
 	pagevec_release(&pvec);
+	trace_mm_pagereclaim_shrinkactive(pgscanned, file, priority);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)
@@ -1516,6 +1524,7 @@ static void shrink_zone(int priority, struct zone *zone,
 	}
 
 	sc->nr_reclaimed = nr_reclaimed;
+	trace_mm_pagereclaim_shrinkzone(nr_reclaimed, priority);
 
 	/*
 	 * Even if we did not try to evict anon pages at all, we want to
@@ -1678,6 +1687,8 @@ out:
 	if (priority < 0)
 		priority = 0;
 
+	trace_mm_directreclaim_reclaimall(zonelist[0]._zonerefs->zone->node,
+						sc->nr_reclaimed, priority);
 	if (scanning_global_lru(sc)) {
 		for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
 
@@ -1947,6 +1958,7 @@ out:
 		goto loop_again;
 	}
 
+	trace_mm_kswapd_ran(pgdat, sc.nr_reclaimed);
 	return sc.nr_reclaimed;
 }
 
@@ -2299,7 +2311,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
 	struct reclaim_state reclaim_state;
-	int priority;
+	int priority = ZONE_RECLAIM_PRIORITY;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2366,6 +2378,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	p->reclaim_state = NULL;
 	current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
+	trace_mm_directreclaim_reclaimzone(zone->node,
+				sc.nr_reclaimed, priority);
 	return sc.nr_reclaimed >= nr_pages;
 }
 

  reply	other threads:[~2009-06-17 14:13 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-21 22:45 [Patch] mm tracepoints update Larry Woodman
2009-04-22  1:00 ` KOSAKI Motohiro
2009-04-22  9:57   ` Ingo Molnar
2009-04-22 12:07     ` Larry Woodman
2009-04-22 19:22       ` [Patch] mm tracepoints update - use case Larry Woodman
2009-04-23  0:48         ` KOSAKI Motohiro
2009-04-23  4:50           ` Andrew Morton
2009-04-23  8:42             ` Ingo Molnar
2009-04-23 11:47               ` Larry Woodman
2009-04-24 20:48                 ` Larry Woodman
2009-06-15 18:26           ` Rik van Riel
2009-06-17 14:07             ` Larry Woodman [this message]
2009-06-18  7:57             ` KOSAKI Motohiro
2009-06-18 19:22               ` Larry Woodman
2009-06-18 19:40                 ` Rik van Riel
2009-06-22  3:37                   ` KOSAKI Motohiro
2009-06-22 15:04                     ` Larry Woodman
2009-06-23  5:52                       ` KOSAKI Motohiro
2009-06-22  3:37                 ` KOSAKI Motohiro
2009-06-22 15:28                   ` Larry Woodman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A38F885.8040009@redhat.com \
    --to=lwoodman@redhat.com \
    --cc=eduard.munteanu@linux360.ro \
    --cc=fweisbec@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=lwang@redhat.com \
    --cc=mingo@elte.hu \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).