xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Andreas Olsowski <andreas.olsowski@leuphana.de>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Fraser <keir.xen@gmail.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: megasas stops I/O when running kernel as dom0 under xen4.1/4.2
Date: Fri, 26 Aug 2011 19:16:12 +0100	[thread overview]
Message-ID: <4E57E2EC.4090406@citrix.com> (raw)
In-Reply-To: <4E5532C3.8090601@citrix.com>

[-- Attachment #1: Type: text/plain, Size: 3156 bytes --]

On 24/08/11 18:20, Andrew Cooper wrote:
>
> On 24/08/11 18:09, Konrad Rzeszutek Wilk wrote:
>> On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote:
>>> On 24/08/11 13:06, Andrew Cooper wrote:
>>>> On 22/08/11 10:05, Andrew Cooper wrote:
>>>>> On 19/08/11 19:10, Andreas Olsowski wrote:
>>>>>> Am 19.08.2011 18:49, schrieb Andrew Cooper:
>>>>>>
>>>>>>> The only change you need to make is in megasas_probe_one() in
>>>>>>> megaraid_sas_base.c
>>>>>>>
>>>>>>> Add a call to pci_enable_msi(pdev) immediately after the current
>>>>>> call to
>>>>>>> pci_set_master(pdev);
>>>>>>>
>>>>>>> ~Andrew
>>>>>>>
>>>>>> Yep, that works fine. Removed the module option as well.
>>>>>>
>>>>>> root@tarballerina:~# cat /proc/interrupts  |grep mega
>>>>>> 2236:      69010          0          0          0          0         
>>>>>> 0          0          0  xen-pirq-msi       megasas
>>>>>>
>>>>>> The same procedure that would have lead to almost instant errors has
>>>>>> not brought them to appear again.
>>>>>>
>>>>> Good.  This is what we are seeing as well.  I am still awaiting a reply
>>>>> from LSI on this topic.
>>>>>
>>>>> Unfortunately, this does point to a regression in the way Xen deals with
>>>>> legacy interrupts.
>>>> Out of interest, on all 3 of your boxes with the megaraid_sas cards,
>>>> could you gather the io_apic information?
>>>>
>>>> It is the z xen debug key on the serial console (or alternatively put
>>>> apic_verbosity=debug on the xen commandline and the information gets
>>>> dumped into the dmesg)
>>> You can ignore this - it is not relevant.
>>>
>>> I have narrowed the problem to a bug in the interrupt migration code.
>> Goodies!
>>> The bug occurs when the move pending flag is set, and somehow another
>>> interrupt comes in on the old pcpu without triggering the move
>>> completion code.  This leaves the IO_APIC with ack'd but not EOI'd
>>> interrupt from the megaraid_sas device.
>> Ah, so the interrupt is delievered to Dom0 on the old per_cpu
>> event which is ignored. Ignored b/c we have rebinded the event channel
>> to the other CPU, right?
> The interrupt is not ignored - it seems to be being serviced by the
> device driver in dom0.   I will admit that my debugging code may be a
> bit flaky - I started by trying to match IRQ35 (which is always claimed
> by PCI INTA on this server - very useful for debugging) between do_IRQ
> and its related PHYSDEVOP_eoi.
>
> I am currently trying to track the exact order of events around this
> interrupt which misses the move completion code.
>
>> Is there any code in the Hypervisor to turn off interrupt migration code?
> Not that I have found, although playing around with vcpu and task
> pinning should work.  My debugging shows that Xen-4.1.1 is migrating
> this interrupt between PCPUs on average once every 4 real interrupts
> when dom0 is under any load whatsoever.
>

Please try attached patch.  It is a hack, but it works as far as I can test.

(Patch is taken against xen-4.1.1 but should be trivial to port if it
doesn't apply cleanly)

~Andrew

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: debug-wip.patch --]
[-- Type: text/x-patch, Size: 7214 bytes --]

# HG changeset patch
# Parent 589651f9a8a62fa25dc062b5b23a0ce947a259a5

diff -r 589651f9a8a6 xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c	Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/hpet.c	Fri Aug 26 19:08:34 2011 +0100
@@ -325,7 +325,7 @@ static void hpet_msi_ack(unsigned int ir
     ack_APIC_irq();
 }
 
-static void hpet_msi_end(unsigned int irq)
+static void hpet_msi_end(unsigned int irq, u8 vector)
 {
 }
 
diff -r 589651f9a8a6 xen/arch/x86/i8259.c
--- a/xen/arch/x86/i8259.c	Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/i8259.c	Fri Aug 26 19:08:34 2011 +0100
@@ -91,7 +91,7 @@ static unsigned int startup_8259A_irq(un
     return 0; /* never anything pending */
 }
 
-static void end_8259A_irq(unsigned int irq)
+static void end_8259A_irq(unsigned int irq, u8 vector)
 {
     if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)))
         enable_8259A_irq(irq);
diff -r 589651f9a8a6 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c	Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/io_apic.c	Fri Aug 26 19:08:34 2011 +0100
@@ -1657,7 +1657,7 @@ static void mask_and_ack_level_ioapic_ir
     }
 }
 
-static void end_level_ioapic_irq (unsigned int irq)
+static void end_level_ioapic_irq (unsigned int irq, u8 vector)
 {
     unsigned long v;
     int i;
@@ -1706,6 +1706,14 @@ static void end_level_ioapic_irq (unsign
  */
     i = IO_APIC_VECTOR(irq);
 
+    /* CA-65000 - manually EOI the old vector if we are moving to the new */
+    if ( vector && old != vector )
+    {
+        int ioapic;
+        for (ioapic = 0; ioapic < nr_ioapics; ioapic++)
+            io_apic_eoi(ioapic, old);
+    }
+
     v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
 
     ack_APIC_irq();
@@ -1729,7 +1737,7 @@ static void disable_edge_ioapic_irq(unsi
 {
 }
 
-static void end_edge_ioapic_irq(unsigned int irq)
+static void end_edge_ioapic_irq(unsigned int irq, u8 vector)
  {
  }
 
@@ -1780,7 +1788,7 @@ static void ack_msi_irq(unsigned int irq
         ack_APIC_irq(); /* ACKTYPE_NONE */
 }
 
-static void end_msi_irq(unsigned int irq)
+static void end_msi_irq(unsigned int irq, u8 vector)
 {
     if ( !msi_maskable_irq(irq_desc[irq].msi_desc) )
         ack_APIC_irq(); /* ACKTYPE_EOI */
@@ -1833,7 +1841,7 @@ static void ack_lapic_irq(unsigned int i
     ack_APIC_irq();
 }
 
-static void end_lapic_irq(unsigned int irq) { /* nothing */ }
+static void end_lapic_irq(unsigned int irq, u8 vector) { /* nothing */ }
 
 static hw_irq_controller lapic_irq_type = {
     .typename 	= "local-APIC-edge",
diff -r 589651f9a8a6 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/irq.c	Fri Aug 26 19:08:34 2011 +0100
@@ -349,6 +349,7 @@ static void __do_IRQ_guest(int vector);
 void no_action(int cpl, void *dev_id, struct cpu_user_regs *regs) { }
 
 static void enable_none(unsigned int vector) { }
+static void end_none(unsigned int irq, u8 vector) { }
 static unsigned int startup_none(unsigned int vector) { return 0; }
 static void disable_none(unsigned int vector) { }
 static void ack_none(unsigned int irq)
@@ -357,7 +358,6 @@ static void ack_none(unsigned int irq)
 }
 
 #define shutdown_none   disable_none
-#define end_none        enable_none
 
 hw_irq_controller no_irq_type = {
     "none",
@@ -385,6 +385,7 @@ int __assign_irq_vector(int irq, struct 
     static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0;
     unsigned int old_vector;
     int cpu, err;
+    unsigned long flags;
     cpumask_t tmp_mask;
 
     /* If we already have a vector on one of the cpus in the mask,
@@ -437,6 +438,7 @@ next:
         /* Found one! */
         current_vector = vector;
         current_offset = offset;
+        local_irq_save(flags);
         if (old_vector) {
             cfg->move_in_progress = 1;
             cpus_copy(cfg->old_cpu_mask, cfg->cpu_mask);
@@ -467,6 +469,7 @@ next:
             if (IO_APIC_IRQ(irq))
                     irq_vector[irq] = vector;
         err = 0;
+        local_irq_restore(flags);
         break;
     }
     return err;
@@ -674,7 +677,7 @@ asmlinkage void do_IRQ(struct cpu_user_r
     desc->status &= ~IRQ_INPROGRESS;
 
  out:
-    desc->handler->end(irq);
+    desc->handler->end(irq, regs->entry_vector);
  out_no_end:
     spin_unlock(&desc->lock);
     irq_exit();
@@ -890,7 +893,7 @@ static void irq_guest_eoi_timer_fn(void 
     switch ( action->ack_type )
     {
     case ACKTYPE_UNMASK:
-        desc->handler->end(irq);
+        desc->handler->end(irq, 0);
         break;
     case ACKTYPE_EOI:
         cpu_eoi_map = action->cpu_eoi_map;
@@ -921,7 +924,7 @@ static void __do_IRQ_guest(int irq)
         /* An interrupt may slip through while freeing an ACKTYPE_EOI irq. */
         ASSERT(action->ack_type == ACKTYPE_EOI);
         ASSERT(desc->status & IRQ_DISABLED);
-        desc->handler->end(irq);
+        desc->handler->end(irq, vector);
         return;
     }
 
@@ -1032,7 +1035,7 @@ static void flush_ready_eoi(void)
         ASSERT(irq > 0);
         desc = irq_to_desc(irq);
         spin_lock(&desc->lock);
-        desc->handler->end(irq);
+        desc->handler->end(irq, peoi[sp].vector);
         spin_unlock(&desc->lock);
     }
 
@@ -1113,7 +1116,7 @@ static void __pirq_guest_eoi(struct doma
     if ( action->ack_type == ACKTYPE_UNMASK )
     {
         ASSERT(cpus_empty(action->cpu_eoi_map));
-        desc->handler->end(irq);
+        desc->handler->end(irq, 0);
         spin_unlock_irq(&desc->lock);
         return;
     }
@@ -1373,7 +1376,7 @@ static irq_guest_action_t *__pirq_guest_
     case ACKTYPE_UNMASK:
         if ( test_and_clear_bit(pirq, d->pirq_mask) &&
              (--action->in_flight == 0) )
-            desc->handler->end(irq);
+            desc->handler->end(irq, 0);
         break;
     case ACKTYPE_EOI:
         /* NB. If #guests == 0 then we clear the eoi_map later on. */
diff -r 589651f9a8a6 xen/drivers/passthrough/amd/iommu_init.c
--- a/xen/drivers/passthrough/amd/iommu_init.c	Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/drivers/passthrough/amd/iommu_init.c	Fri Aug 26 19:08:34 2011 +0100
@@ -442,7 +442,7 @@ static unsigned int iommu_msi_startup(un
     return 0;
 }
 
-static void iommu_msi_end(unsigned int irq)
+static void iommu_msi_end(unsigned int irq, u8 vector)
 {
     iommu_msi_unmask(irq);
     ack_APIC_irq();
diff -r 589651f9a8a6 xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c	Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/drivers/passthrough/vtd/iommu.c	Fri Aug 26 19:08:34 2011 +0100
@@ -978,7 +978,7 @@ static unsigned int dma_msi_startup(unsi
     return 0;
 }
 
-static void dma_msi_end(unsigned int irq)
+static void dma_msi_end(unsigned int irq, u8 vector)
 {
     dma_msi_unmask(irq);
     ack_APIC_irq();
diff -r 589651f9a8a6 xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/include/xen/irq.h	Fri Aug 26 19:08:34 2011 +0100
@@ -44,7 +44,7 @@ struct hw_interrupt_type {
     void (*enable)(unsigned int irq);
     void (*disable)(unsigned int irq);
     void (*ack)(unsigned int irq);
-    void (*end)(unsigned int irq);
+    void (*end)(unsigned int irq, u8 vector);
     void (*set_affinity)(unsigned int irq, cpumask_t mask);
 };
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2011-08-26 18:16 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-11 13:59 megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Andreas Olsowski
2011-08-11 16:27 ` Simon Rowe
2011-08-11 22:51   ` Konrad Rzeszutek Wilk
2011-08-12  6:31     ` xen.frontend flag for higher display resolution (vnc) for HVM domU domains Mark Schneider
2011-08-12  7:26       ` Marc - A. Dahlhaus
2011-08-12  7:42     ` megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Simon Rowe
2011-08-12  9:11     ` Andreas Olsowski
2011-08-12  9:23       ` Simon Rowe
2011-08-15 10:49       ` Simon Rowe
2011-08-15 12:52         ` Andreas Olsowski
2011-08-19 12:28           ` Andrew Cooper
2011-08-19 14:17             ` Andreas Olsowski
2011-08-19 14:57               ` Andrew Cooper
2011-08-19 16:37                 ` Andreas Olsowski
2011-08-19 16:49                   ` Andrew Cooper
2011-08-19 18:10                     ` Andreas Olsowski
2011-08-22  9:05                       ` Andrew Cooper
2011-08-24 12:06                         ` Andrew Cooper
2011-08-24 16:57                           ` Andrew Cooper
2011-08-24 17:09                             ` Konrad Rzeszutek Wilk
2011-08-24 17:20                               ` Andrew Cooper
2011-08-26 18:16                                 ` Andrew Cooper [this message]
2011-08-26 18:32                                   ` Andrew Cooper
2011-08-30 12:02                                     ` Andreas Olsowski
2011-08-30 12:11                                       ` Andrew Cooper
2011-08-30 12:46                                         ` Keir Fraser
2011-08-12  9:02 ` Simon Rowe
2011-08-12 16:26   ` Pasi Kärkkäinen
2011-08-15  7:44     ` Simon Rowe
2011-08-12 16:25 ` Pasi Kärkkäinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E57E2EC.4090406@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=andreas.olsowski@leuphana.de \
    --cc=keir.xen@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).