From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Date: Fri, 26 Aug 2011 19:16:12 +0100 Message-ID: <4E57E2EC.4090406@citrix.com> References: <4E4916A3.9070106@leuphana.de> <4E4E56EE.2070801@citrix.com> <4E4E705E.3040505@leuphana.de> <4E4E79E8.3020808@citrix.com> <4E4E913C.40809@leuphana.de> <4E4E9423.8010904@citrix.com> <4E4EA725.30405@leuphana.de> <4E521BD5.2040609@citrix.com> <4E54E93D.7060301@citrix.com> <4E552D62.2000002@citrix.com> <20110824170919.GA14696@dumpdata.com> <4E5532C3.8090601@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------060900030700060907020800" Return-path: In-Reply-To: <4E5532C3.8090601@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andreas Olsowski Cc: "xen-devel@lists.xensource.com" , Fraser , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org --------------060900030700060907020800 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit On 24/08/11 18:20, Andrew Cooper wrote: > > On 24/08/11 18:09, Konrad Rzeszutek Wilk wrote: >> On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote: >>> On 24/08/11 13:06, Andrew Cooper wrote: >>>> On 22/08/11 10:05, Andrew Cooper wrote: >>>>> On 19/08/11 19:10, Andreas Olsowski wrote: >>>>>> Am 19.08.2011 18:49, schrieb Andrew Cooper: >>>>>> >>>>>>> The only change you need to make is in megasas_probe_one() in >>>>>>> megaraid_sas_base.c >>>>>>> >>>>>>> Add a call to pci_enable_msi(pdev) immediately after the current >>>>>> call to >>>>>>> pci_set_master(pdev); >>>>>>> >>>>>>> ~Andrew >>>>>>> >>>>>> Yep, that works fine. Removed the module option as well. >>>>>> >>>>>> root@tarballerina:~# cat /proc/interrupts |grep mega >>>>>> 2236: 69010 0 0 0 0 >>>>>> 0 0 0 xen-pirq-msi megasas >>>>>> >>>>>> The same procedure that would have lead to almost instant errors has >>>>>> not brought them to appear again. >>>>>> >>>>> Good. This is what we are seeing as well. I am still awaiting a reply >>>>> from LSI on this topic. >>>>> >>>>> Unfortunately, this does point to a regression in the way Xen deals with >>>>> legacy interrupts. >>>> Out of interest, on all 3 of your boxes with the megaraid_sas cards, >>>> could you gather the io_apic information? >>>> >>>> It is the z xen debug key on the serial console (or alternatively put >>>> apic_verbosity=debug on the xen commandline and the information gets >>>> dumped into the dmesg) >>> You can ignore this - it is not relevant. >>> >>> I have narrowed the problem to a bug in the interrupt migration code. >> Goodies! >>> The bug occurs when the move pending flag is set, and somehow another >>> interrupt comes in on the old pcpu without triggering the move >>> completion code. This leaves the IO_APIC with ack'd but not EOI'd >>> interrupt from the megaraid_sas device. >> Ah, so the interrupt is delievered to Dom0 on the old per_cpu >> event which is ignored. Ignored b/c we have rebinded the event channel >> to the other CPU, right? > The interrupt is not ignored - it seems to be being serviced by the > device driver in dom0. I will admit that my debugging code may be a > bit flaky - I started by trying to match IRQ35 (which is always claimed > by PCI INTA on this server - very useful for debugging) between do_IRQ > and its related PHYSDEVOP_eoi. > > I am currently trying to track the exact order of events around this > interrupt which misses the move completion code. > >> Is there any code in the Hypervisor to turn off interrupt migration code? > Not that I have found, although playing around with vcpu and task > pinning should work. My debugging shows that Xen-4.1.1 is migrating > this interrupt between PCPUs on average once every 4 real interrupts > when dom0 is under any load whatsoever. > Please try attached patch. It is a hack, but it works as far as I can test. (Patch is taken against xen-4.1.1 but should be trivial to port if it doesn't apply cleanly) ~Andrew -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com --------------060900030700060907020800 Content-Type: text/x-patch; name="debug-wip.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="debug-wip.patch" # HG changeset patch # Parent 589651f9a8a62fa25dc062b5b23a0ce947a259a5 diff -r 589651f9a8a6 xen/arch/x86/hpet.c --- a/xen/arch/x86/hpet.c Thu Aug 25 17:50:48 2011 +0100 +++ b/xen/arch/x86/hpet.c Fri Aug 26 19:08:34 2011 +0100 @@ -325,7 +325,7 @@ static void hpet_msi_ack(unsigned int ir ack_APIC_irq(); } -static void hpet_msi_end(unsigned int irq) +static void hpet_msi_end(unsigned int irq, u8 vector) { } diff -r 589651f9a8a6 xen/arch/x86/i8259.c --- a/xen/arch/x86/i8259.c Thu Aug 25 17:50:48 2011 +0100 +++ b/xen/arch/x86/i8259.c Fri Aug 26 19:08:34 2011 +0100 @@ -91,7 +91,7 @@ static unsigned int startup_8259A_irq(un return 0; /* never anything pending */ } -static void end_8259A_irq(unsigned int irq) +static void end_8259A_irq(unsigned int irq, u8 vector) { if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS))) enable_8259A_irq(irq); diff -r 589651f9a8a6 xen/arch/x86/io_apic.c --- a/xen/arch/x86/io_apic.c Thu Aug 25 17:50:48 2011 +0100 +++ b/xen/arch/x86/io_apic.c Fri Aug 26 19:08:34 2011 +0100 @@ -1657,7 +1657,7 @@ static void mask_and_ack_level_ioapic_ir } } -static void end_level_ioapic_irq (unsigned int irq) +static void end_level_ioapic_irq (unsigned int irq, u8 vector) { unsigned long v; int i; @@ -1706,6 +1706,14 @@ static void end_level_ioapic_irq (unsign */ i = IO_APIC_VECTOR(irq); + /* CA-65000 - manually EOI the old vector if we are moving to the new */ + if ( vector && old != vector ) + { + int ioapic; + for (ioapic = 0; ioapic < nr_ioapics; ioapic++) + io_apic_eoi(ioapic, old); + } + v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1)); ack_APIC_irq(); @@ -1729,7 +1737,7 @@ static void disable_edge_ioapic_irq(unsi { } -static void end_edge_ioapic_irq(unsigned int irq) +static void end_edge_ioapic_irq(unsigned int irq, u8 vector) { } @@ -1780,7 +1788,7 @@ static void ack_msi_irq(unsigned int irq ack_APIC_irq(); /* ACKTYPE_NONE */ } -static void end_msi_irq(unsigned int irq) +static void end_msi_irq(unsigned int irq, u8 vector) { if ( !msi_maskable_irq(irq_desc[irq].msi_desc) ) ack_APIC_irq(); /* ACKTYPE_EOI */ @@ -1833,7 +1841,7 @@ static void ack_lapic_irq(unsigned int i ack_APIC_irq(); } -static void end_lapic_irq(unsigned int irq) { /* nothing */ } +static void end_lapic_irq(unsigned int irq, u8 vector) { /* nothing */ } static hw_irq_controller lapic_irq_type = { .typename = "local-APIC-edge", diff -r 589651f9a8a6 xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Thu Aug 25 17:50:48 2011 +0100 +++ b/xen/arch/x86/irq.c Fri Aug 26 19:08:34 2011 +0100 @@ -349,6 +349,7 @@ static void __do_IRQ_guest(int vector); void no_action(int cpl, void *dev_id, struct cpu_user_regs *regs) { } static void enable_none(unsigned int vector) { } +static void end_none(unsigned int irq, u8 vector) { } static unsigned int startup_none(unsigned int vector) { return 0; } static void disable_none(unsigned int vector) { } static void ack_none(unsigned int irq) @@ -357,7 +358,6 @@ static void ack_none(unsigned int irq) } #define shutdown_none disable_none -#define end_none enable_none hw_irq_controller no_irq_type = { "none", @@ -385,6 +385,7 @@ int __assign_irq_vector(int irq, struct static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0; unsigned int old_vector; int cpu, err; + unsigned long flags; cpumask_t tmp_mask; /* If we already have a vector on one of the cpus in the mask, @@ -437,6 +438,7 @@ next: /* Found one! */ current_vector = vector; current_offset = offset; + local_irq_save(flags); if (old_vector) { cfg->move_in_progress = 1; cpus_copy(cfg->old_cpu_mask, cfg->cpu_mask); @@ -467,6 +469,7 @@ next: if (IO_APIC_IRQ(irq)) irq_vector[irq] = vector; err = 0; + local_irq_restore(flags); break; } return err; @@ -674,7 +677,7 @@ asmlinkage void do_IRQ(struct cpu_user_r desc->status &= ~IRQ_INPROGRESS; out: - desc->handler->end(irq); + desc->handler->end(irq, regs->entry_vector); out_no_end: spin_unlock(&desc->lock); irq_exit(); @@ -890,7 +893,7 @@ static void irq_guest_eoi_timer_fn(void switch ( action->ack_type ) { case ACKTYPE_UNMASK: - desc->handler->end(irq); + desc->handler->end(irq, 0); break; case ACKTYPE_EOI: cpu_eoi_map = action->cpu_eoi_map; @@ -921,7 +924,7 @@ static void __do_IRQ_guest(int irq) /* An interrupt may slip through while freeing an ACKTYPE_EOI irq. */ ASSERT(action->ack_type == ACKTYPE_EOI); ASSERT(desc->status & IRQ_DISABLED); - desc->handler->end(irq); + desc->handler->end(irq, vector); return; } @@ -1032,7 +1035,7 @@ static void flush_ready_eoi(void) ASSERT(irq > 0); desc = irq_to_desc(irq); spin_lock(&desc->lock); - desc->handler->end(irq); + desc->handler->end(irq, peoi[sp].vector); spin_unlock(&desc->lock); } @@ -1113,7 +1116,7 @@ static void __pirq_guest_eoi(struct doma if ( action->ack_type == ACKTYPE_UNMASK ) { ASSERT(cpus_empty(action->cpu_eoi_map)); - desc->handler->end(irq); + desc->handler->end(irq, 0); spin_unlock_irq(&desc->lock); return; } @@ -1373,7 +1376,7 @@ static irq_guest_action_t *__pirq_guest_ case ACKTYPE_UNMASK: if ( test_and_clear_bit(pirq, d->pirq_mask) && (--action->in_flight == 0) ) - desc->handler->end(irq); + desc->handler->end(irq, 0); break; case ACKTYPE_EOI: /* NB. If #guests == 0 then we clear the eoi_map later on. */ diff -r 589651f9a8a6 xen/drivers/passthrough/amd/iommu_init.c --- a/xen/drivers/passthrough/amd/iommu_init.c Thu Aug 25 17:50:48 2011 +0100 +++ b/xen/drivers/passthrough/amd/iommu_init.c Fri Aug 26 19:08:34 2011 +0100 @@ -442,7 +442,7 @@ static unsigned int iommu_msi_startup(un return 0; } -static void iommu_msi_end(unsigned int irq) +static void iommu_msi_end(unsigned int irq, u8 vector) { iommu_msi_unmask(irq); ack_APIC_irq(); diff -r 589651f9a8a6 xen/drivers/passthrough/vtd/iommu.c --- a/xen/drivers/passthrough/vtd/iommu.c Thu Aug 25 17:50:48 2011 +0100 +++ b/xen/drivers/passthrough/vtd/iommu.c Fri Aug 26 19:08:34 2011 +0100 @@ -978,7 +978,7 @@ static unsigned int dma_msi_startup(unsi return 0; } -static void dma_msi_end(unsigned int irq) +static void dma_msi_end(unsigned int irq, u8 vector) { dma_msi_unmask(irq); ack_APIC_irq(); diff -r 589651f9a8a6 xen/include/xen/irq.h --- a/xen/include/xen/irq.h Thu Aug 25 17:50:48 2011 +0100 +++ b/xen/include/xen/irq.h Fri Aug 26 19:08:34 2011 +0100 @@ -44,7 +44,7 @@ struct hw_interrupt_type { void (*enable)(unsigned int irq); void (*disable)(unsigned int irq); void (*ack)(unsigned int irq); - void (*end)(unsigned int irq); + void (*end)(unsigned int irq, u8 vector); void (*set_affinity)(unsigned int irq, cpumask_t mask); }; --------------060900030700060907020800 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --------------060900030700060907020800--