From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andreas Olsowski <andreas.olsowski@leuphana.de>,
"Keir@rcsinet13.oracle.com" <Keir@rcsinet13.oracle.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Fraser <keir.xen@gmail.com>
Subject: Re: megasas stops I/O when running kernel as dom0 under xen4.1/4.2
Date: Wed, 24 Aug 2011 18:20:03 +0100 [thread overview]
Message-ID: <4E5532C3.8090601@citrix.com> (raw)
In-Reply-To: <20110824170919.GA14696@dumpdata.com>
On 24/08/11 18:09, Konrad Rzeszutek Wilk wrote:
> On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote:
>> On 24/08/11 13:06, Andrew Cooper wrote:
>>> On 22/08/11 10:05, Andrew Cooper wrote:
>>>> On 19/08/11 19:10, Andreas Olsowski wrote:
>>>>> Am 19.08.2011 18:49, schrieb Andrew Cooper:
>>>>>
>>>>>> The only change you need to make is in megasas_probe_one() in
>>>>>> megaraid_sas_base.c
>>>>>>
>>>>>> Add a call to pci_enable_msi(pdev) immediately after the current
>>>>> call to
>>>>>> pci_set_master(pdev);
>>>>>>
>>>>>> ~Andrew
>>>>>>
>>>>> Yep, that works fine. Removed the module option as well.
>>>>>
>>>>> root@tarballerina:~# cat /proc/interrupts |grep mega
>>>>> 2236: 69010 0 0 0 0
>>>>> 0 0 0 xen-pirq-msi megasas
>>>>>
>>>>> The same procedure that would have lead to almost instant errors has
>>>>> not brought them to appear again.
>>>>>
>>>> Good. This is what we are seeing as well. I am still awaiting a reply
>>>> from LSI on this topic.
>>>>
>>>> Unfortunately, this does point to a regression in the way Xen deals with
>>>> legacy interrupts.
>>> Out of interest, on all 3 of your boxes with the megaraid_sas cards,
>>> could you gather the io_apic information?
>>>
>>> It is the z xen debug key on the serial console (or alternatively put
>>> apic_verbosity=debug on the xen commandline and the information gets
>>> dumped into the dmesg)
>> You can ignore this - it is not relevant.
>>
>> I have narrowed the problem to a bug in the interrupt migration code.
> Goodies!
>> The bug occurs when the move pending flag is set, and somehow another
>> interrupt comes in on the old pcpu without triggering the move
>> completion code. This leaves the IO_APIC with ack'd but not EOI'd
>> interrupt from the megaraid_sas device.
> Ah, so the interrupt is delievered to Dom0 on the old per_cpu
> event which is ignored. Ignored b/c we have rebinded the event channel
> to the other CPU, right?
The interrupt is not ignored - it seems to be being serviced by the
device driver in dom0. I will admit that my debugging code may be a
bit flaky - I started by trying to match IRQ35 (which is always claimed
by PCI INTA on this server - very useful for debugging) between do_IRQ
and its related PHYSDEVOP_eoi.
I am currently trying to track the exact order of events around this
interrupt which misses the move completion code.
> Is there any code in the Hypervisor to turn off interrupt migration code?
Not that I have found, although playing around with vcpu and task
pinning should work. My debugging shows that Xen-4.1.1 is migrating
this interrupt between PCPUs on average once every 4 real interrupts
when dom0 is under any load whatsoever.
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
next prev parent reply other threads:[~2011-08-24 17:20 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-11 13:59 megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Andreas Olsowski
2011-08-11 16:27 ` Simon Rowe
2011-08-11 22:51 ` Konrad Rzeszutek Wilk
2011-08-12 6:31 ` xen.frontend flag for higher display resolution (vnc) for HVM domU domains Mark Schneider
2011-08-12 7:26 ` Marc - A. Dahlhaus
2011-08-12 7:42 ` megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Simon Rowe
2011-08-12 9:11 ` Andreas Olsowski
2011-08-12 9:23 ` Simon Rowe
2011-08-15 10:49 ` Simon Rowe
2011-08-15 12:52 ` Andreas Olsowski
2011-08-19 12:28 ` Andrew Cooper
2011-08-19 14:17 ` Andreas Olsowski
2011-08-19 14:57 ` Andrew Cooper
2011-08-19 16:37 ` Andreas Olsowski
2011-08-19 16:49 ` Andrew Cooper
2011-08-19 18:10 ` Andreas Olsowski
2011-08-22 9:05 ` Andrew Cooper
2011-08-24 12:06 ` Andrew Cooper
2011-08-24 16:57 ` Andrew Cooper
2011-08-24 17:09 ` Konrad Rzeszutek Wilk
2011-08-24 17:20 ` Andrew Cooper [this message]
2011-08-26 18:16 ` Andrew Cooper
2011-08-26 18:32 ` Andrew Cooper
2011-08-30 12:02 ` Andreas Olsowski
2011-08-30 12:11 ` Andrew Cooper
2011-08-30 12:46 ` Keir Fraser
2011-08-12 9:02 ` Simon Rowe
2011-08-12 16:26 ` Pasi Kärkkäinen
2011-08-15 7:44 ` Simon Rowe
2011-08-12 16:25 ` Pasi Kärkkäinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E5532C3.8090601@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Keir@rcsinet13.oracle.com \
--cc=andreas.olsowski@leuphana.de \
--cc=keir.xen@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).