From: "Thimo E." <abc@digithi.de>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
"Dong, Eddie" <eddie.dong@intel.com>,
Xen-develList <xen-devel@lists.xen.org>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
"Zhang, Yang Z" <yang.z.zhang@intel.com>,
"Zhang, Xiantao" <xiantao.zhang@intel.com>
Subject: Re: cpuidle and un-eoid interrupts at the local apic
Date: Wed, 04 Sep 2013 21:56:40 +0200 [thread overview]
Message-ID: <52279078.3030701@digithi.de> (raw)
In-Reply-To: <5227821A.9090201@citrix.com>
Hello Andrew,
thanks for your response. At least I've seen the trigger of the new
crash (2e) already before, so they seem so belong together.
I can't image that I am the only one on the world who is using a haswell
board. And as I haven't seen any other Xen bug/crash reports
like mine (and one time you) nor bug reports from users with other
operating systems, I ask myself if only my hardware is buggy
or if other operating systems handle those "spurious" interrupts in
another way ?!?!
What does " ioapic_ack=old" change ?
Best regards
Thimo
Am 04.09.2013 20:55, schrieb Andrew Cooper:
> On 04/09/13 19:32, Thimo E. wrote:
>> Hello again,
>>
>> the last two weeks no crash with pinning dom0_vcpus_pin and
>> restricting dom0 to 1 cpu. But yesterday it crashed again. So changed
>> the command line again to:
>>
>> iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0
>> console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M
>> watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M
>> cpuid_mask_xsave_eax=0
>>
>> And today server crashed again and produced a lot of debugging
>> messages, see attached. The "..." in the logfiles mean that the
>> message above the points was repeated very often.
>>
>> My summary so far:
>> - With only 1 cpu atteched to dom0 the server was stable for 2 weeks,
>> the crash there did not really show any irq problems, see
>> crash20130903.txt
>> You can find Andrews ideas to this in
>> http://forums.citrix.com/thread.jspa?messageID=1760771#1760771
>> - With more than 1 cpu and irqbalance the server produced the crashes
>> I've already posted before
>> - Without irqbalance crash with some other fancy output, see
>> crash20130904.txt
>>
>> Next step is to change the network card.
>>
>> Zhang, any update from your side ? Or do the others have any idea ?
>> Could "ioapic_ack=old" help somewhere ?
>>
>> Best regards
>> Thimo
>>
> Ok - the second attachment (crash20130903.txt) is the one I have triaged
> before, and the crash is impossible given the expected code flow through
> the function.
>
> %r14 is calculated as a the per-cpu cpu_info, which cannot possibly be
> -1 at the point of the fault. The only explanation is that the
> pagefault is a result of a spurious jump to this location.
>
> From a quick glance at the other crash, vector 2e was the problematic
> one (iirc). The "Bad vmexit (reason 3)" at the top would suggest that
> something on the system has sent an INIT to pcpu 2, which seems antisocial.
>
> As we have identified that the hardware is delivering invalid
> interrupts, I wouldn't necessarily read any more into this new crash;
> something is very broken in the hardware.
>
> I would be interested for any update from Intel regarding the ISR violation.
>
> ~Andrew
next prev parent reply other threads:[~2013-09-04 19:56 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-31 20:32 cpuidle and un-eoid interrupts at the local apic Andrew Cooper
2013-06-03 14:30 ` Jan Beulich
2013-07-31 8:30 ` Thimo E.
2013-07-31 9:47 ` Andrew Cooper
2013-08-02 22:50 ` Thimo E.
2013-08-02 23:32 ` Andrew Cooper
2013-08-05 12:45 ` Jan Beulich
2013-08-05 14:51 ` Andrew Cooper
2013-08-09 21:27 ` Thimo E.
2013-08-09 21:40 ` Andrew Cooper
2013-08-09 21:44 ` Andrew Cooper
2013-08-11 17:46 ` Thimo E.
2013-08-12 6:02 ` Zhang, Yang Z
2013-08-12 8:49 ` Zhang, Yang Z
2013-08-12 8:57 ` Jan Beulich
2013-08-12 11:52 ` Thimo E
2013-08-12 12:04 ` Andrew Cooper
2013-08-19 15:14 ` Thimo E.
2013-08-20 5:43 ` Thimo Eichstädt
2013-08-20 8:40 ` Jan Beulich
2013-08-20 8:50 ` Zhang, Yang Z
2013-08-23 7:22 ` Thimo Eichstädt
2013-08-23 7:30 ` Zhang, Yang Z
2013-08-27 1:03 ` Zhang, Yang Z
2013-09-04 18:32 ` Thimo E.
2013-09-04 18:55 ` Andrew Cooper
2013-09-04 19:56 ` Thimo E. [this message]
2013-09-04 20:54 ` Andrew Cooper
2013-09-05 1:45 ` Zhang, Yang Z
2013-09-05 7:20 ` Thimo E.
2013-09-05 1:15 ` Zhang, Yang Z
2013-09-17 2:09 ` Zhang, Yang Z
2013-09-17 7:39 ` Thimo E.
2013-09-17 7:43 ` Zhang, Yang Z
2013-09-17 21:04 ` Thimo E.
2013-09-18 1:18 ` Zhang, Xiantao
2013-09-18 17:24 ` Thimo E.
2013-09-18 12:06 ` Andrew Cooper
2013-08-12 13:54 ` Thimo E
2013-08-12 14:06 ` Andrew Cooper
2013-08-13 1:43 ` Zhang, Yang Z
2013-08-13 6:39 ` Thimo E.
2013-08-13 11:39 ` Wu, Feng
2013-08-13 12:46 ` Andrew Cooper
2013-08-12 9:10 ` Andrew Cooper
2013-08-12 5:50 ` Zhang, Yang Z
2013-08-12 8:20 ` Jan Beulich
2013-08-12 9:28 ` Andrew Cooper
2013-08-12 10:05 ` Jan Beulich
2013-08-12 10:27 ` Andrew Cooper
2013-08-14 2:53 ` Zhang, Yang Z
2013-08-14 7:51 ` Thimo E.
2013-08-14 9:52 ` Andrew Cooper
2013-09-07 13:27 ` Thimo E.
2013-09-07 17:02 ` Andrew Cooper
2013-09-07 23:37 ` Thimo E.
2013-09-08 9:53 ` Andrew Cooper
2013-09-08 10:24 ` Thimo E.
2013-09-09 13:16 ` Andrew Cooper
2013-09-09 14:48 ` Thimo Eichstädt
2013-09-09 15:12 ` Andrew Cooper
2013-09-09 7:59 ` Jan Beulich
2013-09-09 12:53 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52279078.3030701@digithi.de \
--to=abc@digithi.de \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=eddie.dong@intel.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
--cc=xiantao.zhang@intel.com \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).