From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Keir Fraser <keir@xen.org>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"winston.l.wang@intel.com" <winston.l.wang@intel.com>,
"gang.wei@intel.com" <gang.wei@intel.com>
Subject: Re: Debugging a weird hardware fault.
Date: Tue, 2 Aug 2011 15:56:54 +0100 [thread overview]
Message-ID: <4E381036.4020003@citrix.com> (raw)
In-Reply-To: <CA5D5710.18DF4%keir@xen.org>
On 02/08/11 15:26, Keir Fraser wrote:
> On 02/08/2011 07:14, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:
>
>> Just for information, this turned out to be a BIOS bug. It was setting
>> a 6 second timer when executing _PTS, which hit the system reset if
>> PM1{a,b} had not been hit when the timer expired. As Xen does all of
>> its shutdown after the call to _PTS and before PM1{a,b}, there is a
>> significant time gap, which was falling fowl of the timer in most cases.
> Six seconds though, that's quite a long time! Is it a big box?
It is a Netscalar SDX box, designed to have 24 logical pcpus, 96GB ram,
320 pci-passed-through ixgbe virtual functions (claiming 3 irqs per vf).
It seems that Xen spends a fair amount of time doing freeze_domains
(even though dom0 has already shut down all domUs, albeit forcibly if
they haven't shut down nicely within 15 seconds), and bringing down the
other CPUs (in particular, it spends ages fiddling around with irq
affinities).
Overall, there is probably quite a bit of optimization which could be
done, but that still doesn't excuse a BIOS deciding that "a long time"
as per the ACPI spec is "less than 6 seconds".
~Andrew
>> In this case, it seems likely that a BIOS fix can be done, as Supermicro
>> do provide a custom BIOS for the NetScalar box in question.
>>
>> However, If anyone else comes across this issue, we did make a software
>> solution. You can replace /etc/init.d/halt (or equivalent for your
>> chosen dom0 distro) to KEXEC reboot into a native kernel which listens
>> for a special command line parameter and calls pm_power_off_prepare()
>> and pm_power_off() after the ACPI module has initialized[1].
>>
>> This issue does however show that Xen itself is in breach of the ACPI
>> spec, which is a dangerous situation to be in given the fragility of
>> APCI at the best of times. In due course, I will put my mind to solving
>> the dom0-Xen ACPI interaction problems if the question is still open.
> Yes, this is ultimately the issue. It's going to be a pain to fix properly,
> unfortunately.
>
> -- Keir
>
>
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
next prev parent reply other threads:[~2011-08-02 14:56 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-28 19:53 Debugging a weird hardware fault Andrew Cooper
2011-07-28 20:42 ` Keir Fraser
2011-07-28 22:45 ` Andrew Cooper
2011-07-29 7:10 ` Keir Fraser
2011-07-29 7:24 ` Keir Fraser
2011-08-02 14:14 ` Andrew Cooper
2011-08-02 14:26 ` Keir Fraser
2011-08-02 14:56 ` Andrew Cooper [this message]
-- strict thread matches above, loose matches on Subject: below --
2011-08-03 14:48 Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E381036.4020003@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=gang.wei@intel.com \
--cc=keir@xen.org \
--cc=winston.l.wang@intel.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.