[PATCH] perfcounters: fix "perf counters kills oprofile" bug

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] perfcounters: fix "perf counters kills oprofile" bug
@ 2009-02-05  6:00 Mike Galbraith
  2009-02-05 14:26 ` Ingo Molnar
  2009-02-07 11:47 ` kerneltop "enhancements" Mike Galbraith
  0 siblings, 2 replies; 4+ messages in thread
From: Mike Galbraith @ 2009-02-05  6:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arjan van de Ven, Thomas Gleixner, Paul Mackerras,
	LKML

[-- Attachment #1: Type: text/plain, Size: 3116 bytes --]

Impact: fix "perf counters kills oprofile" bug

Both oprofile and perfcounters register an NMI die handler, but only one
can handle the NMI.  Conveniently, oprofile unregisters it's notifier
when not actively in use, so setting it's notifier priority higher than
perfcounter's allows oprofile to borrow the NMI for the duration of it's
run.  Tested/works both as module and built-in.

While testing, I found that if kerneltop was generating NMIs at very
high frequency, the kernel may panic when oprofile registered it's
handler.  This turned out to be because oprofile registers it's handler
before reset_value has been allocated, so if an NMI comes in while it's
still setting up, kabOom.  Rather than try more invasive changes, I
followed the lead of other places in op_model_ppro.c, and simply
returned in that highly unlikely event.  (debug warnings attached)

I can break this into two patches if you prefer, but since the panic was
initiated by borrowing the active NMI, I figured they belong together.

Signed-off-by: Mike Galbraith <efault@gmx.de>

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 46c436c..8bb2133 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -643,7 +643,9 @@ perf_counter_nmi_handler(struct notifier_block *self,
 }
 
 static __read_mostly struct notifier_block perf_counter_nmi_notifier = {
-	.notifier_call		= perf_counter_nmi_handler
+	.notifier_call		= perf_counter_nmi_handler,
+	.next			= NULL,
+	.priority		= 1
 };
 
 void __init init_hw_perf_counters(void)
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 202864a..c638685 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -40,8 +40,9 @@ static int profile_exceptions_notify(struct notifier_block *self,
 
 	switch (val) {
 	case DIE_NMI:
-		if (model->check_ctrs(args->regs, &per_cpu(cpu_msrs, cpu)))
-			ret = NOTIFY_STOP;
+	case DIE_NMI_IPI:
+		model->check_ctrs(args->regs, &per_cpu(cpu_msrs, cpu));
+		ret = NOTIFY_STOP;
 		break;
 	default:
 		break;
@@ -134,7 +135,7 @@ static void nmi_cpu_setup(void *dummy)
 static struct notifier_block profile_exceptions_nb = {
 	.notifier_call = profile_exceptions_notify,
 	.next = NULL,
-	.priority = 0
+	.priority = 2
 };
 
 static int nmi_setup(void)
diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 07c9145..85eb626 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -126,6 +126,13 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 	u64 val;
 	int i;
 
+	/*
+	 * This can happen if perf counters are in use when
+	 * we steal the die notifier NMI.
+	 */
+	if (unlikely(!reset_value))
+		goto out;
+
 	for (i = 0 ; i < num_counters; ++i) {
 		if (!reset_value[i])
 			continue;
@@ -136,6 +143,7 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 		}
 	}
 
+out:
 	/* Only P6 based Pentium M need to re-unmask the apic vector but it
 	 * doesn't hurt other P6 variant */
 	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);


[-- Attachment #2: warning --]
[-- Type: text/plain, Size: 5665 bytes --]

[  251.783042] oprofile: using NMI interrupt.
[  252.107966] ------------[ cut here ]------------
[  252.107973] ------------[ cut here ]------------
[  252.107980] WARNING: at arch/x86/oprofile/op_model_ppro.c:132 ppro_check_ctrs+0x37/0xd9 [oprofile]()
[  252.107982] Hardware name: MS-7502
[  252.107984] Modules linked in: oprofile nfsd lockd snd_pcm_oss nfs_acl snd_mixer_oss auth_rpcgss snd_seq exportfs snd_seq_device sunrpc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq ip_tables ip6_tables microcode nls_iso8859_1 nls_cp437 vfat fat fuse loop dm_mod hid_pl hid_cypress hid_zpff hid_gyration hid_sony hid_samsung hid_microsoft hid_tmff hid_monterey snd_hda_codec_realtek hid_ezkey hid_a4tech hid_logitech firewire_ohci snd_hda_intel ff_memless firewire_core hid_cherry snd_hda_codec crc_itu_t hid_sunplus hid_petalynx snd_hwdep usbhid hid_belkin snd_pcm hid_chicony snd_timer usb_storage ohci1394 i2c_i801 snd rtc_cmos hid soundcore libusual sr_mod rtc_core ieee1394 e1000e button intel_agp cdrom rtc_lib i2c_core snd_page_alloc sg ehci_hcd uhci_hcd sd_mod usbcore edd ext3 mbcache jbd fan ahci libata scsi_mod thermal processor [last unloaded: oprofile]
[  252.108029] Pid: 9407, comm: oprofiled Not tainted 2.6.29-tip-smp #121
[  252.108032] Call Trace:
[  252.108033]  <NMI>  [<ffffffff80237f1f>] warn_slowpath+0xd3/0x10f
[  252.108045]  [<ffffffff80273745>] ? rb_reserve_next_event+0x1a5/0x333
[  252.108049]  [<ffffffff80273a19>] ? ring_buffer_lock_reserve+0x83/0xca
[  252.108055]  [<ffffffff802182b1>] ? __smp_perf_counter_interrupt+0x348/0x3bc
[  252.108064]  [<ffffffffa03eab52>] ppro_check_ctrs+0x37/0xd9 [oprofile]
[  252.108073]  [<ffffffffa03e9f6d>] profile_exceptions_notify+0x39/0x40 [oprofile]
[  252.108077]  [<ffffffff8024e38b>] notifier_call_chain+0x33/0x5b
[  252.108080]  [<ffffffff8024e3d5>] atomic_notifier_call_chain+0x13/0x15
[  252.108083]  [<ffffffff8024e477>] notify_die+0x2e/0x30
[  252.108086]  [<ffffffff8020dd27>] do_nmi+0x86/0x21b
[  252.108094]  [<ffffffffa03ea401>] ? nmi_setup+0xfb/0x1b8 [oprofile]
[  252.108100]  [<ffffffff8047a99a>] nmi+0x1a/0x20
[  252.108108]  [<ffffffffa03ea401>] ? nmi_setup+0xfb/0x1b8 [oprofile]
[  252.108112]  [<ffffffff802587b2>] ? smp_call_function_many+0x1d2/0x1e3
[  252.108114]  <<EOE>>  [<ffffffffa03ea4be>] ? nmi_cpu_setup+0x0/0x7c [oprofile]
[  252.108126]  [<ffffffff8029634e>] ? map_vm_area+0x2d/0x40
[  252.108134]  [<ffffffffa03ea4be>] ? nmi_cpu_setup+0x0/0x7c [oprofile]
[  252.108137]  [<ffffffff802587e3>] smp_call_function+0x20/0x24
[  252.108141]  [<ffffffff8023cc71>] on_each_cpu+0x18/0x2c
[  252.108149]  [<ffffffffa03ea4a2>] nmi_setup+0x19c/0x1b8 [oprofile]
[  252.108157]  [<ffffffffa03e92a5>] ? event_buffer_open+0x0/0x6d [oprofile]
[  252.108165]  [<ffffffffa03e817d>] oprofile_setup+0x39/0xa4 [oprofile]
[  252.108173]  [<ffffffffa03e92f0>] event_buffer_open+0x4b/0x6d [oprofile]
[  252.108177]  [<ffffffff8029d43b>] __dentry_open+0x14c/0x265
[  252.108180]  [<ffffffff8029d621>] nameidata_to_filp+0x41/0x52
[  252.108184]  [<ffffffff802a9c48>] do_filp_open+0x448/0x897
[  252.108188]  [<ffffffff80294dda>] ? page_add_new_anon_rmap+0x5a/0x5f
[  252.108191]  [<ffffffff8028d7d4>] ? handle_mm_fault+0x290/0x676
[  252.108195]  [<ffffffff802b2014>] ? alloc_fd+0x6d/0x116
[  252.108198]  [<ffffffff8029d232>] do_sys_open+0x53/0xd3
[  252.108201]  [<ffffffff8029d2db>] sys_open+0x1b/0x1d
[  252.108204]  [<ffffffff8020be5b>] system_call_fastpath+0x16/0x1b
[  252.108207] ---[ end trace caef0a6178020015 ]---
[  252.111949] WARNING: at arch/x86/oprofile/op_model_ppro.c:132 ppro_check_ctrs+0x37/0xd9 [oprofile]()
[  252.111949] Hardware name: MS-7502
[  252.111949] Modules linked in: oprofile nfsd lockd snd_pcm_oss nfs_acl snd_mixer_oss auth_rpcgss snd_seq exportfs snd_seq_device sunrpc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq ip_tables ip6_tables microcode nls_iso8859_1 nls_cp437 vfat fat fuse loop dm_mod hid_pl hid_cypress hid_zpff hid_gyration hid_sony hid_samsung hid_microsoft hid_tmff hid_monterey snd_hda_codec_realtek hid_ezkey hid_a4tech hid_logitech firewire_ohci snd_hda_intel ff_memless firewire_core hid_cherry snd_hda_codec crc_itu_t hid_sunplus hid_petalynx snd_hwdep usbhid hid_belkin snd_pcm hid_chicony snd_timer usb_storage ohci1394 i2c_i801 snd rtc_cmos hid soundcore libusual sr_mod rtc_core ieee1394 e1000e button intel_agp cdrom rtc_lib i2c_core snd_page_alloc sg ehci_hcd uhci_hcd sd_mod usbcore edd ext3 mbcache jbd fan ahci libata scsi_mod thermal processor [last unloaded: oprofile]
[  252.111949] Pid: 0, comm: swapper Tainted: G        W  2.6.29-tip-smp #121
[  252.111949] Call Trace:
[  252.111949]  <NMI>  [<ffffffff80237f1f>] warn_slowpath+0xd3/0x10f
[  252.111949]  [<ffffffff80273745>] ? rb_reserve_next_event+0x1a5/0x333
[  252.111949]  [<ffffffff80273a19>] ? ring_buffer_lock_reserve+0x83/0xca
[  252.111949]  [<ffffffff802182b1>] ? __smp_perf_counter_interrupt+0x348/0x3bc
[  252.111949]  [<ffffffffa03eab52>] ppro_check_ctrs+0x37/0xd9 [oprofile]
[  252.111949]  [<ffffffffa03e9f6d>] profile_exceptions_notify+0x39/0x40 [oprofile]
[  252.111949]  [<ffffffff8024e38b>] notifier_call_chain+0x33/0x5b
[  252.111949]  [<ffffffff8024e3d5>] atomic_notifier_call_chain+0x13/0x15
[  252.111949]  [<ffffffff8024e477>] notify_die+0x2e/0x30
[  252.111949]  [<ffffffff8020dd27>] do_nmi+0x86/0x21b
[  252.111949]  [<ffffffff8047a99a>] nmi+0x1a/0x20
[  252.111949]  [<ffffffff8021267b>] ? default_idle+0x2b/0x40
[  252.111949]  <<EOE>>  [<ffffffff8020b141>] cpu_idle+0x52/0x93
[  252.111949]  [<ffffffff804755cb>] start_secondary+0x191/0x196
[  252.111949] ---[ end trace caef0a6178020016 ]---

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] perfcounters: fix "perf counters kills oprofile" bug
  2009-02-05  6:00 [PATCH] perfcounters: fix "perf counters kills oprofile" bug Mike Galbraith
@ 2009-02-05 14:26 ` Ingo Molnar
  2009-02-07 11:47 ` kerneltop "enhancements" Mike Galbraith
  1 sibling, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2009-02-05 14:26 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Arjan van de Ven, Thomas Gleixner, Paul Mackerras,
	LKML


* Mike Galbraith <efault@gmx.de> wrote:

> Impact: fix "perf counters kills oprofile" bug
> 
> Both oprofile and perfcounters register an NMI die handler, but only one
> can handle the NMI.  Conveniently, oprofile unregisters it's notifier
> when not actively in use, so setting it's notifier priority higher than
> perfcounter's allows oprofile to borrow the NMI for the duration of it's
> run.  Tested/works both as module and built-in.
> 
> While testing, I found that if kerneltop was generating NMIs at very
> high frequency, the kernel may panic when oprofile registered it's
> handler.  This turned out to be because oprofile registers it's handler
> before reset_value has been allocated, so if an NMI comes in while it's
> still setting up, kabOom.  Rather than try more invasive changes, I
> followed the lead of other places in op_model_ppro.c, and simply
> returned in that highly unlikely event.  (debug warnings attached)

applied to tip:perfcounters/core, thanks Mike!

> I can break this into two patches if you prefer, but since the panic was 
> initiated by borrowing the active NMI, I figured they belong together.

No need, it's fine this way. Note that there's two commits from you: i 
applied your earier fix of this already, and now i did a delta patch of the 
other bug you found. The delta patch i have applied is below.

Thanks,

	Ingo

--------------------->
>From 82aa9a1829199233f9bdaf26e2ee271114f4701e Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 5 Feb 2009 15:23:08 +0100
Subject: [PATCH] perfcounters: fix "perf counters kills oprofile" bug, v2

Impact: fix kernel crash

Both oprofile and perfcounters register an NMI die handler, but only one
can handle the NMI.  Conveniently, oprofile unregisters it's notifier
when not actively in use, so setting it's notifier priority higher than
perfcounter's allows oprofile to borrow the NMI for the duration of it's
run.  Tested/works both as module and built-in.

While testing, I found that if kerneltop was generating NMIs at very
high frequency, the kernel may panic when oprofile registered it's
handler.  This turned out to be because oprofile registers it's handler
before reset_value has been allocated, so if an NMI comes in while it's
still setting up, kabOom.  Rather than try more invasive changes, I
followed the lead of other places in op_model_ppro.c, and simply
returned in that highly unlikely event.  (debug warnings attached)

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/oprofile/op_model_ppro.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 07c9145..85eb626 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -126,6 +126,13 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 	u64 val;
 	int i;
 
+	/*
+	 * This can happen if perf counters are in use when
+	 * we steal the die notifier NMI.
+	 */
+	if (unlikely(!reset_value))
+		goto out;
+
 	for (i = 0 ; i < num_counters; ++i) {
 		if (!reset_value[i])
 			continue;
@@ -136,6 +143,7 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 		}
 	}
 
+out:
 	/* Only P6 based Pentium M need to re-unmask the apic vector but it
 	 * doesn't hurt other P6 variant */
 	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* kerneltop "enhancements"
  2009-02-05  6:00 [PATCH] perfcounters: fix "perf counters kills oprofile" bug Mike Galbraith
  2009-02-05 14:26 ` Ingo Molnar
@ 2009-02-07 11:47 ` Mike Galbraith
  2009-02-08  1:17   ` Paul Mackerras
  1 sibling, 1 reply; 4+ messages in thread
From: Mike Galbraith @ 2009-02-07 11:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arjan van de Ven, Thomas Gleixner, Paul Mackerras,
	LKML

Greetings,

Dunno if you're interested in any of this...

While tinkering, trying to figure out why I receive more events when
using -C 0 than when profiling all CPUs, I found that it's because of
blocking reads jamming up the loop.  I also noticed that when using
poll, my event count when not using -C N would rise up to the expected
level, but it maxed out at around 25k events/sec for the whole box.
That turned out to be...
 
	if (!event_array[i][counter].revents)
		continue;

I guess I need to look into that a bit more, but without it, using poll
to stop the loop only once we've read all we can get with non-blocking
reads, things improved a bunch.  However, when only one CPU was busy, I
ended up with default_idle screwing things up.  Below is what I did
about all this.

Now, with netperf TCP_RR running on one CPU...
------------------------------------------------------------------------------
 KernelTop:   23686 irqs/sec  kernel:95.8% [NMI, 100000 CPU cycles],  (all, 4 CPUs)
------------------------------------------------------------------------------

             events         RIP          kernel function
  ______     ______   ________________   _______________

            6972.00 - ffffffff803f1c66 : tcp_ack
            6290.00 - ffffffff8047bb7f : __schedule
            4733.00 - ffffffff803ec4df : tcp_sendmsg
            4437.00 - ffffffff803f5f96 : tcp_transmit_skb
            4080.00 - ffffffff8047df48 : _spin_lock_irqsave
            3685.00 - ffffffff8020a6a8 : __switch_to
            3502.00 - ffffffff8047e0ac : _spin_lock_bh
            3477.00 - ffffffff803eb543 : tcp_recvmsg
            3227.00 - ffffffff803fbcbf : tcp_v4_rcv
            3091.00 - ffffffff803f3df5 : tcp_rcv_established
            3052.00 - ffffffff8020be10 : system_call
            2376.00 - ffffffff8047dfd2 : _spin_lock
            2153.00 - ffffffff803c2630 : net_rx_action
            2129.00 - ffffffff803e7526 : __inet_lookup_established
            2071.00 - ffffffff803c3b28 : netif_receive_skb
            2055.00 - ffffffff8023ce9f : local_bh_enable
            2043.00 - ffffffff80211c88 : native_sched_clock
            2010.00 - ffffffff803e069a : ip_rcv
            1910.00 - ffffffff8047e01e : _spin_unlock_irqrestore
..or on all CPUS..

------------------------------------------------------------------------------
 KernelTop:   93783 irqs/sec  kernel:96.0% [NMI, 100000 CPU cycles],
(all, 4 CPUs)
------------------------------------------------------------------------------

             events         RIP          kernel function
  ______     ______   ________________   _______________

           30798.00 - ffffffff803f1c66 : tcp_ack
           26258.00 - ffffffff803ec4df : tcp_sendmsg
           20500.00 - ffffffff8047bb7f : __schedule
           19242.00 - ffffffff803f5f96 : tcp_transmit_skb
           14895.00 - ffffffff803eb543 : tcp_recvmsg
           13926.00 - ffffffff8047e0ac : _spin_lock_bh
           13015.00 - ffffffff803f3df5 : tcp_rcv_established
           12699.00 - ffffffff803e4945 : ip_queue_xmit
           12213.00 - ffffffff8047df48 : _spin_lock_irqsave
           11770.00 - ffffffff803fbcbf : tcp_v4_rcv
           11585.00 - ffffffff8020a6a8 : __switch_to
           11391.00 - ffffffff8020be10 : system_call
            9522.00 - ffffffff803e4fdc : ip_finish_output
            9247.00 - ffffffff803f83a8 : tcp_write_xmit
            8764.00 - ffffffff803c851a : dst_release
            8597.00 - ffffffff8023ce9f : local_bh_enable
            8301.00 - ffffffff8047dfd2 : _spin_lock
            8261.00 - ffffffff803c2630 : net_rx_action
            8224.00 - ffffffff803c3b28 : netif_receive_skb


BTW, how does one convince getopt_long() that an option really really
doesn't require an argument?  no_argument works for the long version,
but it insists that an argument is required for the short version
regardless.

--- kerneltop.c.org	2009-02-05 09:54:23.000000000 +0100
+++ kerneltop.c	2009-02-07 11:53:59.000000000 +0100
@@ -199,6 +199,7 @@ static unsigned long		filter_start;
 static unsigned long		filter_end;
 
 static int			delay_secs			=  2;
+static int			zero;
 static int			dump_symtab;
 
 struct source_line {
@@ -232,7 +233,8 @@ static void display_help(void)
 	" -f CNT --filter=CNT          # min-event-count filter          [default: 100]\n\n"
 	" -x path   --vmlinux=<path>   # the vmlinux binary, for -s use:\n"
 	" -s symbol --symbol=<symbol>  # function to be showed annotated one-shot\n"
-	" -D 1 --dump_symtab=1         # dump symbol table to stderr on startup\n"
+	" -z 1      --zero             # zero counts after display\n"
+	" -D 1      --dump_symtab      # dump symbol table to stderr on startup\n"
 	"\n");
 
 	exit(0);
@@ -257,10 +259,11 @@ static void process_options(int argc, ch
 			{"pid",		required_argument,	NULL, 'p'},
 			{"vmlinux",	required_argument,	NULL, 'x'},
 			{"symbol",	required_argument,	NULL, 's'},
+			{"zero",	no_argument,		NULL, 'z'},
 			{"dump_symtab",	required_argument,	NULL, 'D'},
 			{NULL,		0,			NULL,  0 }
 		};
-		int c = getopt_long(argc, argv, "c:C:d:e:f:g:n:p:s:x:D:",
+		int c = getopt_long(argc, argv, "c:C:d:e:f:g:n:p:s:x:z:D:",
 				    long_options, &option_index);
 		if (c == -1)
 			break;
@@ -284,6 +287,7 @@ static void process_options(int argc, ch
 		case 'n': nmi				=   atoi(optarg); break;
 		case 'p': tid				=   atoi(optarg); break;
 		case 's': sym_filter			= strdup(optarg); break;
+		case 'z': zero				=              1; break;
 		case 'x': vmlinux			= strdup(optarg); break;
 		case 'D': dump_symtab			=   atoi(optarg); break;
 		default: error = 1; break;
@@ -314,6 +318,7 @@ struct sym_entry {
 	unsigned long long	addr;
 	char			*sym;
 	unsigned long		count[MAX_COUNTERS];
+	int			skip;
 	GList			*source;
 };
 
@@ -357,8 +362,6 @@ static long			events;
 static long			userspace_events;
 static const char		CONSOLE_CLEAR[] = "^[[H^[[2J";
 
-#define USE_POLL		0
-
 static struct sym_entry		tmp[MAX_SYMS];
 
 static void print_sym_table(void)
@@ -445,7 +448,7 @@ static void print_sym_table(void)
 		 * Add decay to the counts:
 		 */
 		for (count = 0; count < nr_counters; count++)
-			sym_table[i].count[count] = sym_table[i].count[count] * 7 / 8;
+			sym_table[i].count[count] = zero ? 0 : sym_table[i].count[count] * 7 / 8;
 	}
 
 	if (sym_filter_entry)
@@ -489,18 +492,13 @@ static int read_symbol(FILE *in, struct
 
 	sym = str;
 
-	if (!strcmp(sym, "_text")) {
-		min_ip = s->addr;
+	/* Filter out known duplicates and non-text symbols. */
+	if (!strcmp(sym, "_text"))
 		return 1;
-	}
-	if (!min_ip && !strcmp(sym, "_stext")) {
-		min_ip = s->addr;
+	if (!min_ip && !strcmp(sym, "_stext"))
 		return 1;
-	}
 	if (!strcmp(sym, "_etext") || !strcmp(sym, "_sinittext"))
 		return 1;
-
-	/* Filter out known duplicates and non-text symbols. */
 	if (stype != 'T' && stype != 't')
 		return 1;
 	if (!strncmp("init_module", sym, 11) || !strncmp("cleanup_module", sym, 14))
@@ -512,6 +510,13 @@ static int read_symbol(FILE *in, struct
 	assert(s->sym);
 
 	strcpy((char *)s->sym, str);
+	s->skip = 0;
+
+	/* Tag events to be skipped. */
+	if (!strcmp("default_idle", s->sym) || !strcmp("cpu_idle", s->sym))
+		s->skip = 1;
+	if (!strcmp("enter_idle", s->sym) || !strcmp("exit_idle", s->sym))
+		s->skip = 1;
 
 	if (filter_match == 1) {
 		filter_end = s->addr;
@@ -783,18 +788,14 @@ static void record_ip(uint64_t ip, int c
 
 	idx = left_idx;
 
-	sym_table[idx].count[counter]++;
+	if (!sym_table[idx].skip)
+		sym_table[idx].count[counter]++;
+	else events--;
 }
 
-static const int event_threshold = 10000;
-
 static void process_event(uint64_t ip, int counter)
 {
 	events++;
-	if (time(NULL) >= last_refresh + delay_secs) {
-		print_sym_table();
-		events = userspace_events = 0;
-	}
 
 	if (ip < min_ip || ip > max_ip) {
 		userspace_events++;
@@ -813,9 +814,7 @@ int main(int argc, char *argv[])
 	unsigned int cpu;
 	uint64_t ip;
 	ssize_t res;
-#if USE_POLL
 	int ret;
-#endif
 
 	process_options(argc, argv);
 
@@ -840,6 +839,7 @@ int main(int argc, char *argv[])
 			hw_event.nmi		= nmi;
 
 			fd[i][counter] = sys_perf_counter_open(&hw_event, tid, cpu, group_fd);
+			fcntl(fd[i][counter], F_SETFL, O_NONBLOCK);
 			if (fd[i][counter] < 0) {
 				printf("kerneltop error: syscall returned with %d (%s)\n",
 					fd[i][counter], strerror(-fd[i][counter]));
@@ -868,23 +868,27 @@ int main(int argc, char *argv[])
 	last_refresh = time(NULL);
 
 	while (1) {
-#if USE_POLL
-		ret = poll(event_array, nr_cpus, 1000);
-#endif
+		int hits = events;
 
 		for (i = 0; i < nr_cpus; i++) {
 			for (counter = 0; counter < nr_counters; counter++) {
-#if USE_POLL
-				if (!event_array[i][counter].revents)
-					continue;
-#endif
-
 				res = read(fd[i][counter], (char *) &ip, sizeof(ip));
-				assert(res == sizeof(ip));
+				if (res > 0) {
+					assert(res == sizeof(ip));
 
-				process_event(ip, counter);
+					process_event(ip, counter);
+				}
 			}
 		}
+
+		if (time(NULL) >= last_refresh + delay_secs) {
+			print_sym_table();
+			events = userspace_events = 0;
+		}
+
+		if (hits == events)
+			ret = poll(event_array, nr_cpus, 1000);
+		hits = events;
 	}
 
 	return 0;



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kerneltop "enhancements"
  2009-02-07 11:47 ` kerneltop "enhancements" Mike Galbraith
@ 2009-02-08  1:17   ` Paul Mackerras
  0 siblings, 0 replies; 4+ messages in thread
From: Paul Mackerras @ 2009-02-08  1:17 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Peter Zijlstra, Arjan van de Ven, Thomas Gleixner,
	LKML

Mike Galbraith writes:

> BTW, how does one convince getopt_long() that an option really really
> doesn't require an argument?  no_argument works for the long version,
> but it insists that an argument is required for the short version
> regardless.

Take out the ":" after the "z" in the string argument to getopt_long.

Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-08  1:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-05  6:00 [PATCH] perfcounters: fix "perf counters kills oprofile" bug Mike Galbraith
2009-02-05 14:26 ` Ingo Molnar
2009-02-07 11:47 ` kerneltop "enhancements" Mike Galbraith
2009-02-08  1:17   ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox