From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161810AbdAHMbe (ORCPT ); Sun, 8 Jan 2017 07:31:34 -0500 Received: from mail.kernel.org ([198.145.29.136]:54800 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752627AbdAHMb2 (ORCPT ); Sun, 8 Jan 2017 07:31:28 -0500 Date: Sun, 8 Jan 2017 21:31:11 +0900 From: Masami Hiramatsu To: Masami Hiramatsu Cc: Peter Zijlstra , Ingo Molnar , Josh Poimboeuf , linux-kernel@vger.kernel.org, Ananth N Mavinakayanahalli , Thomas Gleixner , "H . Peter Anvin" , Andrey Konovalov , Steven Rostedt Subject: Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area Message-Id: <20170108213111.7f78e0d665783275ffd739bc@kernel.org> In-Reply-To: <20170108132242.9fe3e09e97de0a29c06178b5@kernel.org> References: <20161226133012.347f7e45dbf8a8d671ea07fb@kernel.org> <148281924021.12148.14275351848773920571.stgit@devbox> <20170103105402.GB25813@worktop.programming.kicks-ass.net> <20170104140604.5e2d53c69580f5c67ea6cd62@kernel.org> <20170104100102.GE25813@worktop.programming.kicks-ass.net> <20170108132242.9fe3e09e97de0a29c06178b5@kernel.org> X-Mailer: Sylpheed 3.5.0 (GTK+ 2.24.31; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart=_Sun__8_Jan_2017_21_31_11_+0900_FzN8C=MjfamF7kR+" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --Multipart=_Sun__8_Jan_2017_21_31_11_+0900_FzN8C=MjfamF7kR+ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Hello, On Sun, 8 Jan 2017 13:22:42 +0900 Masami Hiramatsu wrote: > > > Would you have any good benchmark to measure it? > > > > Not trivially so; what I did was cobble together a debugfs file that > > measures the average of the PMI time in perf_sample_event_took(), and a > > module that has a 10 deep callchain around a while(1) loop. Then perf > > record with callchains for a few seconds. > > > > Generating the callchain does the unwinder thing and ends up calling > > is_kernel_address() lots. > > > > The case I worked on was 0 modules vs 100+ modules in a distro build, > > which was fairly obviously painful back then, since > > is_module_text_address() used a linear lookup. > > > > I'm not sure I still have all those bits, but I can dig around a bit if > > you're interested. > > Hmm, I tried to do similar thing (make a test module which has a loop with > 10 deep recursive call and save stack-trace) on kvm, but got very unstable > results. > Maybe it needs to run on bare-metal machine. On bare-metal machine, I tried to use attached module to measure actual performance degrade. benchmarks with no kprobeq $ lsmod | wc -l 123 $ for i in {0..10}; do sudo insmod ./unwind_bench.ko && sudo rmmod unwind_bench;done [ 245.639150] Start benchmarking. [ 245.639353] End benchmarking. Elapsed: 620579 cycles, loop count:1000 [ 277.481621] Start benchmarking. [ 277.481821] End benchmarking. Elapsed: 609523 cycles, loop count:1000 [ 302.210924] Start benchmarking. [ 302.211077] End benchmarking. Elapsed: 441272 cycles, loop count:1000 [ 302.242931] Start benchmarking. [ 302.243070] End benchmarking. Elapsed: 401676 cycles, loop count:1000 [ 302.270682] Start benchmarking. [ 302.270762] End benchmarking. Elapsed: 242776 cycles, loop count:1000 [ 302.294656] Start benchmarking. [ 302.294737] End benchmarking. Elapsed: 247352 cycles, loop count:1000 [ 302.318517] Start benchmarking. [ 302.318600] End benchmarking. Elapsed: 255048 cycles, loop count:1000 [ 302.344519] Start benchmarking. [ 302.344599] End benchmarking. Elapsed: 242392 cycles, loop count:1000 [ 302.369450] Start benchmarking. [ 302.369530] End benchmarking. Elapsed: 242628 cycles, loop count:1000 [ 302.399517] Start benchmarking. [ 302.399605] End benchmarking. Elapsed: 268736 cycles, loop count:1000 [ 302.423508] Start benchmarking. [ 302.423587] End benchmarking. Elapsed: 242068 cycles, loop count:1000 [ 302.451282] Start benchmarking. [ 302.451361] End benchmarking. Elapsed: 242272 cycles, loop count:1000 [ 302.480656] Start benchmarking. [ 302.480736] End benchmarking. Elapsed: 243552 cycles, loop count:1000 benchmarks with disabled 106 kprobes (only 89 probes are optimized) # grep "[tT] audit_*" /proc/kallsyms | while read a b c d; do echo p $c+5 >> kprobe_events ;done # cat ../kprobes/list | wc -l 106 # echo 1 > events/enable # grep OPTIMIZED ../kprobes/list | wc -l 89 [ 1037.065151] Start benchmarking. [ 1037.065500] End benchmarking. Elapsed: 1067196 cycles, loop count:1000 [ 1037.113893] Start benchmarking. [ 1037.114766] End benchmarking. Elapsed: 2667646 cycles, loop count:1000 [ 1037.149251] Start benchmarking. [ 1037.149382] End benchmarking. Elapsed: 402500 cycles, loop count:1000 [ 1037.180287] Start benchmarking. [ 1037.180368] End benchmarking. Elapsed: 246000 cycles, loop count:1000 [ 1037.208416] Start benchmarking. [ 1037.208503] End benchmarking. Elapsed: 266768 cycles, loop count:1000 [ 1037.236328] Start benchmarking. [ 1037.236407] End benchmarking. Elapsed: 242396 cycles, loop count:1000 [ 1037.270207] Start benchmarking. [ 1037.270288] End benchmarking. Elapsed: 246384 cycles, loop count:1000 [ 1037.302086] Start benchmarking. [ 1037.302174] End benchmarking. Elapsed: 266796 cycles, loop count:1000 [ 1037.330165] Start benchmarking. [ 1037.330245] End benchmarking. Elapsed: 242888 cycles, loop count:1000 [ 1037.358207] Start benchmarking. [ 1037.358287] End benchmarking. Elapsed: 243288 cycles, loop count:1000 [ 1037.388134] Start benchmarking. [ 1037.388213] End benchmarking. Elapsed: 240936 cycles, loop count:1000 FYI, I also tried same benchmark without patch, but it was very unstable. Benchmarks without patch: [ 4583.832368] Start benchmarking. [ 4583.832943] End benchmarking. Elapsed: 1758512 cycles, loop count:1000 [ 4712.636096] Start benchmarking. [ 4712.636712] End benchmarking. Elapsed: 1881112 cycles, loop count:1000 [ 4712.668672] Start benchmarking. [ 4712.669254] End benchmarking. Elapsed: 1777584 cycles, loop count:1000 [ 4712.696150] Start benchmarking. [ 4712.696925] End benchmarking. Elapsed: 2324292 cycles, loop count:1000 [ 4712.727850] Start benchmarking. [ 4712.727988] End benchmarking. Elapsed: 422768 cycles, loop count:1000 [ 4712.756108] Start benchmarking. [ 4712.756684] End benchmarking. Elapsed: 1755676 cycles, loop count:1000 [ 4712.780775] Start benchmarking. [ 4712.781347] End benchmarking. Elapsed: 1749872 cycles, loop count:1000 [ 4712.811910] Start benchmarking. [ 4712.812510] End benchmarking. Elapsed: 1825972 cycles, loop count:1000 [ 4712.844676] Start benchmarking. [ 4712.845270] End benchmarking. Elapsed: 1815532 cycles, loop count:1000 [ 4712.876057] Start benchmarking. [ 4712.876203] End benchmarking. Elapsed: 421700 cycles, loop count:1000 [ 4712.908149] Start benchmarking. [ 4712.908741] End benchmarking. Elapsed: 1777924 cycles, loop count:1000 [ 4712.943254] Start benchmarking. [ 4712.943823] End benchmarking. Elapsed: 1754012 cycles, loop count:1000 Thank you, -- Masami Hiramatsu --Multipart=_Sun__8_Jan_2017_21_31_11_+0900_FzN8C=MjfamF7kR+ Content-Type: text/plain; name="unwind_bench.c" Content-Disposition: attachment; filename="unwind_bench.c" Content-Transfer-Encoding: 7bit #include #include #include #include #define MAX_LOOP 1000 #define MAX_CALL 10 #define MAX_STACK 32 unsigned long stack_entries[MAX_STACK]; void stackdump(void) { struct stack_trace st; st.max_entries = MAX_STACK; st.entries = stack_entries; save_stack_trace(&st); } void alpha(unsigned long call) { if (call) alpha(--call); else stackdump(); } static void bench_loop(void) { u64 st, ed; unsigned long count; unsigned long flags; pr_info("Start benchmarking.\n"); local_irq_save(flags); st = get_cycles(); for (count = 0; count < MAX_LOOP; count++) alpha(MAX_CALL); ed = get_cycles(); local_irq_restore(flags); pr_info("End benchmarking. Elapsed: %lu cycles, loop count:%lu\n", (unsigned long)(ed - st), count); } static int __init bench_init(void) { bench_loop(); return 0; } static void __exit bench_exit(void) { } module_init(bench_init) module_exit(bench_exit) MODULE_LICENSE("GPL"); --Multipart=_Sun__8_Jan_2017_21_31_11_+0900_FzN8C=MjfamF7kR+--