From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Date: Wed, 24 Aug 2011 13:09:19 -0400 Message-ID: <20110824170919.GA14696@dumpdata.com> References: <4E4916A3.9070106@leuphana.de> <4E4E56EE.2070801@citrix.com> <4E4E705E.3040505@leuphana.de> <4E4E79E8.3020808@citrix.com> <4E4E913C.40809@leuphana.de> <4E4E9423.8010904@citrix.com> <4E4EA725.30405@leuphana.de> <4E521BD5.2040609@citrix.com> <4E54E93D.7060301@citrix.com> <4E552D62.2000002@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4E552D62.2000002@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andrew Cooper Cc: Andreas Olsowski , Keir@rcsinet13.oracle.com, "xen-devel@lists.xensource.com" , Fraser List-Id: xen-devel@lists.xenproject.org On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote: > On 24/08/11 13:06, Andrew Cooper wrote: > > On 22/08/11 10:05, Andrew Cooper wrote: > >> On 19/08/11 19:10, Andreas Olsowski wrote: > >>> Am 19.08.2011 18:49, schrieb Andrew Cooper: > >>> > >>>> The only change you need to make is in megasas_probe_one() in > >>>> megaraid_sas_base.c > >>>> > >>>> Add a call to pci_enable_msi(pdev) immediately after the current > >>> call to > >>>> pci_set_master(pdev); > >>>> > >>>> ~Andrew > >>>> > >>> Yep, that works fine. Removed the module option as well. > >>> > >>> root@tarballerina:~# cat /proc/interrupts |grep mega > >>> 2236: 69010 0 0 0 0 > >>> 0 0 0 xen-pirq-msi megasas > >>> > >>> The same procedure that would have lead to almost instant errors has > >>> not brought them to appear again. > >>> > >> Good. This is what we are seeing as well. I am still awaiting a reply > >> from LSI on this topic. > >> > >> Unfortunately, this does point to a regression in the way Xen deals with > >> legacy interrupts. > > Out of interest, on all 3 of your boxes with the megaraid_sas cards, > > could you gather the io_apic information? > > > > It is the z xen debug key on the serial console (or alternatively put > > apic_verbosity=debug on the xen commandline and the information gets > > dumped into the dmesg) > > You can ignore this - it is not relevant. > > I have narrowed the problem to a bug in the interrupt migration code. Goodies! > > The bug occurs when the move pending flag is set, and somehow another > interrupt comes in on the old pcpu without triggering the move > completion code. This leaves the IO_APIC with ack'd but not EOI'd > interrupt from the megaraid_sas device. Ah, so the interrupt is delievered to Dom0 on the old per_cpu event which is ignored. Ignored b/c we have rebinded the event channel to the other CPU, right? Is there any code in the Hypervisor to turn off interrupt migration code?