From: Ingo Molnar <mingo@elte.hu>
To: Johannes Stezenbach <js@sig21.net>,
Robert Richter <robert.richter@amd.com>,
Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <andi@firstfloor.org>,
x86@kernel.org, linux-kernel@vger.kernel.org,
"Rafael J. Wysocki" <rjw@sisk.pl>
Subject: Re: 2.6.31-rc5 regression: x86 MCE malfunction on Thinkpad T42p
Date: Mon, 10 Aug 2009 22:14:06 +0200 [thread overview]
Message-ID: <20090810201406.GA6961@elte.hu> (raw)
In-Reply-To: <20090810192658.GA15513@sig21.net>
* Johannes Stezenbach <js@sig21.net> wrote:
> On Mon, Aug 10, 2009 at 03:29:23PM +0200, Ingo Molnar wrote:
> > * Johannes Stezenbach <js@sig21.net> wrote:
> > > On Mon, Aug 10, 2009 at 02:32:28PM +0200, Andi Kleen wrote:
> > > >
> > > > When the BIOS doesn't enable it then force enabling lapic might not work.
> > > >
> > > > This could cause either boot failures (obvious) or more subtle
> > > > problems like SMM doing something unexpected. Just saying that
> > > > if you have strange problems later first try disabling this
> > > > option again.
> > >
> > > Thanks for the heads-up. I remember I tried to use oprofile in
> > > the past on this machine and was disappointed that it only got the
> > > timer event. I'll keep lapic for now unless I see signs of
> > > instability.
> >
> > What's the output of something like 'perf stat true', and does 'perf
> > top' output something - i.e. do perfcounters work in general? Once
> > you get to that stage and it works then it should be fine.
>
> # ./perf stat true
>
> Performance counter stats for 'true':
>
> 0.985808 task-clock-msecs # 0.779 CPUs
> 0 context-switches # 0.000 M/sec
> 0 CPU-migrations # 0.000 M/sec
> 110 page-faults # 0.112 M/sec
> 583873 cycles # 592.279 M/sec
> 500937 instructions # 0.858 IPC
> <not counted> cache-references
> <not counted> cache-misses
>
> 0.001265524 seconds time elapsed
That looks almost normal - except for cache-references and
cache-misses that is not counted. Could you send the /proc/cpuinfo
info please?
>
>
> # ./perf top
> ------------------------------------------------------------------------------
> PerfTop: 172 irqs/sec kernel:43.6% [100000 cycles], (all, 1 CPUs)
> ------------------------------------------------------------------------------
>
> samples pcnt RIP kernel function
> ______ _______ _____ ________________ _______________
>
> 96.00 - 16.4% - 00000000c129be10 : acpi_pm_read
> 66.00 - 11.3% - 00000000c116fbb1 : delay_tsc
> 59.00 - 10.1% - 00000000c1172a83 : ioread32
> 26.00 - 4.4% - 00000000c116f567 : vsnprintf
> 21.00 - 3.6% - 00000000c11a4721 : acpi_os_read_port
> 20.00 - 3.4% - 00000000c136612c : schedule
> 19.00 - 3.2% - 00000000c10054b9 : mask_and_ack_8259A
> 18.00 - 3.1% - 00000000c11ce8bc : acpi_idle_enter_bm
> 17.00 - 2.9% - 00000000c1090dd1 : do_select
> 16.00 - 2.7% - 00000000c133f380 : unix_poll
> 14.00 - 2.4% - 00000000c116e2b4 : number
> 13.00 - 2.2% - 00000000c1002847 : sysenter_past_esp
> 10.00 - 1.7% - 00000000c1085b84 : fget_light
> 9.00 - 1.5% - 00000000c13666a7 : preempt_schedule
> 9.00 - 1.5% - 00000000c102cf1b : get_next_timer_interrupt
> ^C
Ok, this looks normal.
> First I tried oprofile while running an endless while loop in bash:
>
> # opreport
> CPU: Pentium M (P6 core), speed 1800 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (clocks processor is not halted, and not in a thermal trip) with a unit mask of 0x00 (No unit mask) count 100000
> CPU_CLK_UNHALT...|
> samples| %|
> ------------------
> 282940 76.5545 bash
> 78266 21.1763 libc-2.9.so
> 1730 0.4681 Xorg
> 1069 0.2892 oprofiled
>
> Looks plausible.
Yeah.
> But in demsg I got this:
>
> Delta way too big! 18446744022868427516 ts=18446744022868427516 write stamp = 0
> ------------[ cut here ]------------
> WARNING: at kernel/trace/ring_buffer.c:1392 rb_reserve_next_event+0x150/0x309()
> Hardware name: 2373Y4M
> Modules linked in: ath5k mac80211 ath cfg80211 oprofile bnep sco rfcomm l2cap bluetooth ehci_hcd uhci_hc
> Pid: 13478, comm: opcontrol Not tainted 2.6.31-rc5 #5
> Call Trace:
> [<c10248dd>] warn_slowpath_common+0x60/0x90
> [<c102491a>] warn_slowpath_null+0xd/0x10
> [<c1054129>] rb_reserve_next_event+0x150/0x309
> [<c1068c06>] ? get_page_from_freelist+0x86/0x35a
> [<c10544f7>] ring_buffer_lock_reserve+0xe7/0x135
> [<f88622c0>] op_cpu_buffer_write_reserve+0x1a/0x4b [oprofile]
> [<f886239d>] op_add_code+0x57/0x98 [oprofile]
> [<c1068c06>] ? get_page_from_freelist+0x86/0x35a
> [<c1367c08>] ? page_fault+0x0/0x8
> [<f8862409>] log_sample+0x2b/0x6c [oprofile]
> [<c1064c76>] ? filemap_fault+0x74/0x32b
> [<f886249c>] oprofile_add_sample+0x3b/0x6b [oprofile]
> [<f88641d8>] ppro_check_ctrs+0x66/0xdb [oprofile]
> [<c1074bde>] ? __do_fault+0x303/0x32f
> [<f8863907>] profile_exceptions_notify+0x1f/0x26 [oprofile]
> [<c103953b>] notifier_call_chain+0x2b/0x55
> [<c10398e3>] __atomic_notifier_call_chain+0x1a/0x3a
> [<c103990f>] atomic_notifier_call_chain+0xc/0xe
> [<c103993e>] notify_die+0x2d/0x2f
> [<c1003c54>] do_nmi+0x63/0x222
> [<c1367d1d>] nmi_stack_correct+0x28/0x2d
> [<c1367c08>] ? page_fault+0x0/0x8
> ---[ end trace d174f39c63495e01 ]---
That's a new warning i havent seen before - i've Cc:-ed Robert
(oprofile maintainer) and Steve (ftrace/ring-buffer maintainer) for
that.
The warning is probably harmless - oprofile sampling still works
fine, right?
Ingo
next prev parent reply other threads:[~2009-08-10 20:14 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 17:09 2.6.31-rc5 regression: x86 MCE malfunction on Thinkpad T42p Johannes Stezenbach
2009-08-09 10:03 ` Johannes Stezenbach
2009-08-09 10:34 ` Bartlomiej Zolnierkiewicz
2009-08-09 16:47 ` Johannes Stezenbach
2009-08-10 10:31 ` Andi Kleen
2009-08-10 12:27 ` Johannes Stezenbach
2009-08-10 12:32 ` Andi Kleen
2009-08-10 12:56 ` Johannes Stezenbach
2009-08-10 13:29 ` Ingo Molnar
2009-08-10 19:26 ` Johannes Stezenbach
2009-08-10 19:44 ` Andi Kleen
2009-08-10 20:05 ` Robert Richter
2009-08-10 20:14 ` Ingo Molnar [this message]
2009-08-10 20:37 ` Johannes Stezenbach
2009-08-10 21:31 ` Ingo Molnar
2009-08-10 22:13 ` Johannes Stezenbach
2009-08-11 9:34 ` [patch] cache-miss and cache-refs events on P6-mobile CPUs Ingo Molnar
2009-08-11 9:39 ` Peter Zijlstra
2009-08-11 11:06 ` Ingo Molnar
2009-08-11 11:21 ` Peter Zijlstra
2009-08-11 15:50 ` Johannes Stezenbach
2009-08-11 16:56 ` Ingo Molnar
2009-08-11 15:40 ` 2.6.31-rc5 regression: x86 MCE malfunction on Thinkpad T42p Johannes Stezenbach
2009-08-17 14:49 ` Steven Rostedt
2009-08-12 11:59 ` *PING* [PATCH]: x86: mce: fix mce warning with disabled lapic Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090810201406.GA6961@elte.hu \
--to=mingo@elte.hu \
--cc=andi@firstfloor.org \
--cc=js@sig21.net \
--cc=linux-kernel@vger.kernel.org \
--cc=rjw@sisk.pl \
--cc=robert.richter@amd.com \
--cc=rostedt@goodmis.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.