From: George Dunlap <George.Dunlap@eu.citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Ian Campbell <Ian.Campbell@citrix.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Subject: Re: Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
Date: Thu, 14 Mar 2013 15:44:37 +0000 [thread overview]
Message-ID: <CAFLBxZbguj39A8o8GRvEq5FFKRCeXXbFMqw9AqZ1TKxrZVCSNw@mail.gmail.com> (raw)
In-Reply-To: <CAFLBxZY+GPCPDK56CRH+C+d24F9yQmz3m8U36YYrTsAQ5NAmvg@mail.gmail.com>
On Thu, Mar 14, 2013 at 10:07 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Fri, Aug 17, 2012 at 3:00 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> So that translates to MSR_K7_EVNTSEL0.
>>
>> And that should only been shown once. Is the perf trying to load
>> the module over and over?
>
> So I've just tested this again with the latest wheezy kernel (3.2.0-4)
> but this time taking a closer look, I see this near the first
> instance:
>
>
> [ 0.072397] Performance Events: Broken BIOS detected, complain to
> your hardware vendor.^M
> [ 0.076015] [Firmware Bug]: the BIOS has corrupted hw-PMU resources
> (MSR c0010000 is 530076)^M
> [ 0.080007] AMD PMU driver.^M
> [ 0.082861] ------------[ cut here ]------------^M
> [ 0.084019] WARNING: at
> /build/buildd-linux_3.2.39-2-i386-4VFKqr/linux-3.2.39/arch/x86/xen/enlighten.c:738
> perf_events_lapic_init+0x28/0x29()^M
> [ 0.088009] Hardware name: empty^M
> [ 0.091294] Modules linked in:^M
> [ 0.092268] Pid: 1, comm: swapper/0 Not tainted 3.2.0-4-686-pae #1
> Debian 3.2.39-2^M
> [ 0.096009] Call Trace:^M
> [ 0.098531] [<c10383c4>] ? warn_slowpath_common+0x68/0x79^M
> [ 0.100011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M
> [ 0.104012] [<c10383e2>] ? warn_slowpath_null+0xd/0x10^M
> [ 0.108011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M
> [ 0.112016] [<c1421ac7>] ? init_hw_perf_events+0x223/0x3b1^M
> [ 0.116012] [<c14218a4>] ? check_bugs+0x1d9/0x1d9^M
> [ 0.120014] [<c1003074>] ? do_one_initcall+0x66/0x10e^M
> [ 0.124012] [<c141a781>] ? kernel_init+0x79/0x131^M
> [ 0.128012] [<c141a708>] ? start_kernel+0x32a/0x32a^M
> [ 0.132013] [<c12c727e>] ? kernel_thread_helper+0x6/0x10^M
> [ 0.136020] ---[ end trace b828488e55b27a3e ]---^M
> [ 0.140015] ... version: 0^M
> [ 0.144011] ... bit width: 48^M
> [ 0.148012] ... generic registers: 4^M
> [ 0.152011] ... value mask: 0000ffffffffffff^M
> [ 0.156013] ... max period: 00007fffffffffff^M
> [ 0.160012] ... fixed-purpose events: 0^M
> [ 0.164013] ... event mask: 000000000000000f^M
> [ 0.168276] NMI watchdog enabled, takes one hw-pmu counter.^M
> (XEN) traps.c:2495:d0 Domain attempted WRMSR 00000000c0010004 from
> 0x0000ffff9af0c3ec to 0x0000fffb5adce6f0.
>
> So relating this back to the discussion about vpmu for guests, it
> looks like maybe it's testing the performance counters, detecting that
> they're broken, but for some reason not actually disabling the NMI
> watchdog, and keeps on using them?
I'm guessing that the problem is in
arch/x86/kernel/cpu/perf_events.c:check_hw_exits(). It has two
failures modes -- "bios_fail" and "msr_fail". It does that check
where it tries to write and then read the perfcounter MSRs to see if
they're functional; if that fails it will go to msr_fail and return
false. However, *before* it does that check, it does some other
checks which, if they fail, will jump right to bios_fail, missing that
check out entirely.
Really the "goto bios_fail" is wrong in all sorts of ways -- e.g., in
the first loop, if it detects that condition early on, it will
entirely miss other MSR checks. I might just propose a complete
rewrite of that function...
-George
prev parent reply other threads:[~2013-03-14 15:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-17 11:17 Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2 George Dunlap
2012-08-17 13:07 ` Konrad Rzeszutek Wilk
2012-08-17 13:18 ` George Dunlap
2012-08-17 13:58 ` Konrad Rzeszutek Wilk
2012-08-17 14:00 ` Konrad Rzeszutek Wilk
2013-03-14 10:07 ` George Dunlap
2013-03-14 15:44 ` George Dunlap [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFLBxZbguj39A8o8GRvEq5FFKRCeXXbFMqw9AqZ1TKxrZVCSNw@mail.gmail.com \
--to=george.dunlap@eu.citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Stefano.Stabellini@eu.citrix.com \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).