* [page fault tracepoint 1/2] Add page fault trace event definitions @ 2013-05-09 6:05 Francis Deslauriers 2013-05-09 6:05 ` [page fault tracepoint 2/2] x86:Instruments page fault trace event Francis Deslauriers 2013-05-09 6:46 ` [page fault tracepoint 1/2] Add page fault trace event definitions zhangwei(Jovi) 0 siblings, 2 replies; 6+ messages in thread From: Francis Deslauriers @ 2013-05-09 6:05 UTC (permalink / raw) To: linux-mm, tglx, mingo, hpa, x86, rostedt, fweisbec Cc: raphael.beamonte, mathieu.desnoyers, linux-kernel, Francis Deslauriers Add page_fault_entry and page_fault_exit event definitions. It will allow each architecture to instrument their page faults. Signed-off-by: Francis Deslauriers <fdeslaur@gmail.com> Reviewed-by: RaphaA<<l Beamonte <raphael.beamonte@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- include/trace/events/fault.h | 51 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 include/trace/events/fault.h diff --git a/include/trace/events/fault.h b/include/trace/events/fault.h new file mode 100644 index 0000000..522ddee --- /dev/null +++ b/include/trace/events/fault.h @@ -0,0 +1,51 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM fault + +#if !defined(_TRACE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_FAULT_H + +#include <linux/tracepoint.h> + +TRACE_EVENT(page_fault_entry, + + TP_PROTO(struct pt_regs *regs, unsigned long address, + int write_access), + + TP_ARGS(regs, address, write_access), + + TP_STRUCT__entry( + __field( unsigned long, ip ) + __field( unsigned long, addr ) + __field( uint8_t, write ) + ), + + TP_fast_assign( + __entry->ip = regs ? instruction_pointer(regs) : 0UL; + __entry->addr = address; + __entry->write = !!write_access; + ), + + TP_printk("ip=%lu addr=%lu write_access=%d", + __entry->ip, __entry->addr, __entry->write) +); + +TRACE_EVENT(page_fault_exit, + + TP_PROTO(int result), + + TP_ARGS(result), + + TP_STRUCT__entry( + __field( int, res ) + ), + + TP_fast_assign( + __entry->res = result; + ), + + TP_printk("result=%d", __entry->res) +); + +#endif /* _TRACE_FAULT_H */ +/* This part must be outside protection */ +#include <trace/define_trace.h> -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [page fault tracepoint 2/2] x86:Instruments page fault trace event 2013-05-09 6:05 [page fault tracepoint 1/2] Add page fault trace event definitions Francis Deslauriers @ 2013-05-09 6:05 ` Francis Deslauriers 2013-05-09 6:46 ` [page fault tracepoint 1/2] Add page fault trace event definitions zhangwei(Jovi) 1 sibling, 0 replies; 6+ messages in thread From: Francis Deslauriers @ 2013-05-09 6:05 UTC (permalink / raw) To: linux-mm, tglx, mingo, hpa, x86, rostedt, fweisbec Cc: raphael.beamonte, mathieu.desnoyers, linux-kernel, Francis Deslauriers Signed-off-by: Francis Deslauriers <fdeslaur@gmail.com> Reviewed-by: RaphaA<<l Beamonte <raphael.beamonte@gmail.com> --- arch/x86/mm/fault.c | 11 +++++++++++ mm/memory.c | 5 +++++ 2 files changed, 16 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 654be4a..e227828 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -20,6 +20,9 @@ #include <asm/kmemcheck.h> /* kmemcheck_*(), ... */ #include <asm/fixmap.h> /* VSYSCALL_START */ +#define CREATE_TRACE_POINTS +#include <trace/events/fault.h> /* trace_page_fault_*(), ... */ + /* * Page fault error code bits: * @@ -756,12 +759,18 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, if (likely(show_unhandled_signals)) show_signal_msg(regs, error_code, address, tsk); + trace_page_fault_entry(regs, address, error_code & PF_WRITE); tsk->thread.cr2 = address; tsk->thread.error_code = error_code; tsk->thread.trap_nr = X86_TRAP_PF; force_sig_info_fault(SIGSEGV, si_code, address, tsk, 0); + /* + * Using -1 here, since there is no VM_FAULT flag to identify + * user accesses triggering SIGSEGV. + */ + trace_page_fault_exit(-1); return; } @@ -1185,7 +1194,9 @@ good_area: * make sure we exit gracefully rather than endlessly redo * the fault: */ + trace_page_fault_entry(regs, address, write); fault = handle_mm_fault(mm, vma, address, flags); + trace_page_fault_exit(fault); if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) { if (mm_fault_error(regs, error_code, address, fault)) diff --git a/mm/memory.c b/mm/memory.c index 6dc1882..0bd86f8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -67,6 +67,8 @@ #include <asm/tlbflush.h> #include <asm/pgtable.h> +#include <trace/events/fault.h> + #include "internal.h" #ifdef LAST_NID_NOT_IN_PAGE_FLAGS @@ -1829,8 +1831,11 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (foll_flags & FOLL_NOWAIT) fault_flags |= (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT); + trace_page_fault_entry(0, start, + foll_flags & FOLL_WRITE); ret = handle_mm_fault(mm, vma, start, fault_flags); + trace_page_fault_exit(ret); if (ret & VM_FAULT_ERROR) { if (ret & VM_FAULT_OOM) -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [page fault tracepoint 1/2] Add page fault trace event definitions 2013-05-09 6:05 [page fault tracepoint 1/2] Add page fault trace event definitions Francis Deslauriers 2013-05-09 6:05 ` [page fault tracepoint 2/2] x86:Instruments page fault trace event Francis Deslauriers @ 2013-05-09 6:46 ` zhangwei(Jovi) 2013-05-09 13:48 ` H. Peter Anvin 1 sibling, 1 reply; 6+ messages in thread From: zhangwei(Jovi) @ 2013-05-09 6:46 UTC (permalink / raw) To: Francis Deslauriers Cc: linux-mm, tglx, mingo, hpa, x86, rostedt, fweisbec, raphael.beamonte, mathieu.desnoyers, linux-kernel On 2013/5/9 14:05, Francis Deslauriers wrote: > Add page_fault_entry and page_fault_exit event definitions. It will > allow each architecture to instrument their page faults. I'm wondering if this tracepoint could handle other page faults, like faults in kernel memory(vmalloc, kmmio, etc...) And if we decide to support those faults, add a type annotate in TP_printk would be much helpful for user, to let user know what type of page faults happened. Thanks. > > Signed-off-by: Francis Deslauriers <fdeslaur@gmail.com> > Reviewed-by: RaphaA<<l Beamonte <raphael.beamonte@gmail.com> > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > --- > include/trace/events/fault.h | 51 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 51 insertions(+) > create mode 100644 include/trace/events/fault.h > > diff --git a/include/trace/events/fault.h b/include/trace/events/fault.h > new file mode 100644 > index 0000000..522ddee > --- /dev/null > +++ b/include/trace/events/fault.h > @@ -0,0 +1,51 @@ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM fault > + > +#if !defined(_TRACE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_FAULT_H > + > +#include <linux/tracepoint.h> > + > +TRACE_EVENT(page_fault_entry, > + > + TP_PROTO(struct pt_regs *regs, unsigned long address, > + int write_access), > + > + TP_ARGS(regs, address, write_access), > + > + TP_STRUCT__entry( > + __field( unsigned long, ip ) > + __field( unsigned long, addr ) > + __field( uint8_t, write ) > + ), > + > + TP_fast_assign( > + __entry->ip = regs ? instruction_pointer(regs) : 0UL; > + __entry->addr = address; > + __entry->write = !!write_access; > + ), > + > + TP_printk("ip=%lu addr=%lu write_access=%d", > + __entry->ip, __entry->addr, __entry->write) > +); > + > +TRACE_EVENT(page_fault_exit, > + > + TP_PROTO(int result), > + > + TP_ARGS(result), > + > + TP_STRUCT__entry( > + __field( int, res ) > + ), > + > + TP_fast_assign( > + __entry->res = result; > + ), > + > + TP_printk("result=%d", __entry->res) > +); > + > +#endif /* _TRACE_FAULT_H */ > +/* This part must be outside protection */ > +#include <trace/define_trace.h> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [page fault tracepoint 1/2] Add page fault trace event definitions 2013-05-09 6:46 ` [page fault tracepoint 1/2] Add page fault trace event definitions zhangwei(Jovi) @ 2013-05-09 13:48 ` H. Peter Anvin 2013-05-13 11:21 ` Mathieu Desnoyers 0 siblings, 1 reply; 6+ messages in thread From: H. Peter Anvin @ 2013-05-09 13:48 UTC (permalink / raw) To: zhangwei(Jovi) Cc: Francis Deslauriers, linux-mm, tglx, mingo, x86, rostedt, fweisbec, raphael.beamonte, mathieu.desnoyers, linux-kernel On 05/08/2013 11:46 PM, zhangwei(Jovi) wrote: > On 2013/5/9 14:05, Francis Deslauriers wrote: >> Add page_fault_entry and page_fault_exit event definitions. It will >> allow each architecture to instrument their page faults. > > I'm wondering if this tracepoint could handle other page faults, > like faults in kernel memory(vmalloc, kmmio, etc...) > > And if we decide to support those faults, add a type annotate in TP_printk > would be much helpful for user, to let user know what type of page faults happened. > The plan for x86 was to switch the IDT so that any exception could get a trace event without any overhead in normal operation. This has been in the process for quite some time but looks like it was getting very close. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [page fault tracepoint 1/2] Add page fault trace event definitions 2013-05-09 13:48 ` H. Peter Anvin @ 2013-05-13 11:21 ` Mathieu Desnoyers 2013-05-13 15:08 ` Steven Rostedt 0 siblings, 1 reply; 6+ messages in thread From: Mathieu Desnoyers @ 2013-05-13 11:21 UTC (permalink / raw) To: H. Peter Anvin Cc: zhangwei(Jovi), Francis Deslauriers, linux-mm, tglx, mingo, x86, rostedt, fweisbec, raphael.beamonte, linux-kernel, Andrea Arcangeli * H. Peter Anvin (hpa@zytor.com) wrote: > On 05/08/2013 11:46 PM, zhangwei(Jovi) wrote: > > On 2013/5/9 14:05, Francis Deslauriers wrote: > >> Add page_fault_entry and page_fault_exit event definitions. It will > >> allow each architecture to instrument their page faults. > > > > I'm wondering if this tracepoint could handle other page faults, > > like faults in kernel memory(vmalloc, kmmio, etc...) > > > > And if we decide to support those faults, add a type annotate in TP_printk > > would be much helpful for user, to let user know what type of page faults happened. > > > > The plan for x86 was to switch the IDT so that any exception could get a > trace event without any overhead in normal operation. This has been in > the process for quite some time but looks like it was getting very close. Hi Peter, Who is leading this IDT instrumentation effort ? Since we have tracepoints in interrupt handlers nowadays, I wonder what makes traps so much more special than interrupts to require the arch-specific complexity of the IDT switcharoo trick ? If I had to guess, the reason for this would be the page fault handler, which is called way too frequently for its own good. The number of page faults triggered by COW on process fork has been impressively high for the past couple of years. IMHO, this should be one extra reason for quickly allowing people to trace those page faults, so they can get an idea of their tremendous performance impact. This could speed up the efforts on transparent huge pages, which seems to be a viable long-term solution to this page-size scalability issue. By default, my 3.5 Linux kernel (Debian) has: $ cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never I think transparent huge pages will become generally useful when enabled by default, and when they will handle the page cache in addition to anonymous pages.[1] Thanks, Mathieu [1] Documentation/vm/transhuge.txt -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [page fault tracepoint 1/2] Add page fault trace event definitions 2013-05-13 11:21 ` Mathieu Desnoyers @ 2013-05-13 15:08 ` Steven Rostedt 0 siblings, 0 replies; 6+ messages in thread From: Steven Rostedt @ 2013-05-13 15:08 UTC (permalink / raw) To: Mathieu Desnoyers Cc: H. Peter Anvin, zhangwei(Jovi), Francis Deslauriers, linux-mm, tglx, mingo, x86, fweisbec, raphael.beamonte, linux-kernel, Andrea Arcangeli, Seiji Aguchi On Mon, 2013-05-13 at 07:21 -0400, Mathieu Desnoyers wrote: > * H. Peter Anvin (hpa@zytor.com) wrote: > Who is leading this IDT instrumentation effort ? > Seiji has been doing most of the work. I've just been busy doing other things but I need to start getting this tidied up, and hopefully this can get into 3.11. https://lkml.org/lkml/2013/4/5/401 -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-13 15:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-09 6:05 [page fault tracepoint 1/2] Add page fault trace event definitions Francis Deslauriers 2013-05-09 6:05 ` [page fault tracepoint 2/2] x86:Instruments page fault trace event Francis Deslauriers 2013-05-09 6:46 ` [page fault tracepoint 1/2] Add page fault trace event definitions zhangwei(Jovi) 2013-05-09 13:48 ` H. Peter Anvin 2013-05-13 11:21 ` Mathieu Desnoyers 2013-05-13 15:08 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).