From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Andreas Olsowski <andreas.olsowski@leuphana.de>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Fraser <keir.xen@gmail.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: megasas stops I/O when running kernel as dom0 under xen4.1/4.2
Date: Fri, 26 Aug 2011 19:16:12 +0100 [thread overview]
Message-ID: <4E57E2EC.4090406@citrix.com> (raw)
In-Reply-To: <4E5532C3.8090601@citrix.com>
[-- Attachment #1: Type: text/plain, Size: 3156 bytes --]
On 24/08/11 18:20, Andrew Cooper wrote:
>
> On 24/08/11 18:09, Konrad Rzeszutek Wilk wrote:
>> On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote:
>>> On 24/08/11 13:06, Andrew Cooper wrote:
>>>> On 22/08/11 10:05, Andrew Cooper wrote:
>>>>> On 19/08/11 19:10, Andreas Olsowski wrote:
>>>>>> Am 19.08.2011 18:49, schrieb Andrew Cooper:
>>>>>>
>>>>>>> The only change you need to make is in megasas_probe_one() in
>>>>>>> megaraid_sas_base.c
>>>>>>>
>>>>>>> Add a call to pci_enable_msi(pdev) immediately after the current
>>>>>> call to
>>>>>>> pci_set_master(pdev);
>>>>>>>
>>>>>>> ~Andrew
>>>>>>>
>>>>>> Yep, that works fine. Removed the module option as well.
>>>>>>
>>>>>> root@tarballerina:~# cat /proc/interrupts |grep mega
>>>>>> 2236: 69010 0 0 0 0
>>>>>> 0 0 0 xen-pirq-msi megasas
>>>>>>
>>>>>> The same procedure that would have lead to almost instant errors has
>>>>>> not brought them to appear again.
>>>>>>
>>>>> Good. This is what we are seeing as well. I am still awaiting a reply
>>>>> from LSI on this topic.
>>>>>
>>>>> Unfortunately, this does point to a regression in the way Xen deals with
>>>>> legacy interrupts.
>>>> Out of interest, on all 3 of your boxes with the megaraid_sas cards,
>>>> could you gather the io_apic information?
>>>>
>>>> It is the z xen debug key on the serial console (or alternatively put
>>>> apic_verbosity=debug on the xen commandline and the information gets
>>>> dumped into the dmesg)
>>> You can ignore this - it is not relevant.
>>>
>>> I have narrowed the problem to a bug in the interrupt migration code.
>> Goodies!
>>> The bug occurs when the move pending flag is set, and somehow another
>>> interrupt comes in on the old pcpu without triggering the move
>>> completion code. This leaves the IO_APIC with ack'd but not EOI'd
>>> interrupt from the megaraid_sas device.
>> Ah, so the interrupt is delievered to Dom0 on the old per_cpu
>> event which is ignored. Ignored b/c we have rebinded the event channel
>> to the other CPU, right?
> The interrupt is not ignored - it seems to be being serviced by the
> device driver in dom0. I will admit that my debugging code may be a
> bit flaky - I started by trying to match IRQ35 (which is always claimed
> by PCI INTA on this server - very useful for debugging) between do_IRQ
> and its related PHYSDEVOP_eoi.
>
> I am currently trying to track the exact order of events around this
> interrupt which misses the move completion code.
>
>> Is there any code in the Hypervisor to turn off interrupt migration code?
> Not that I have found, although playing around with vcpu and task
> pinning should work. My debugging shows that Xen-4.1.1 is migrating
> this interrupt between PCPUs on average once every 4 real interrupts
> when dom0 is under any load whatsoever.
>
Please try attached patch. It is a hack, but it works as far as I can test.
(Patch is taken against xen-4.1.1 but should be trivial to port if it
doesn't apply cleanly)
~Andrew
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
[-- Attachment #2: debug-wip.patch --]
[-- Type: text/x-patch, Size: 7214 bytes --]
# HG changeset patch
# Parent 589651f9a8a62fa25dc062b5b23a0ce947a259a5
diff -r 589651f9a8a6 xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/hpet.c Fri Aug 26 19:08:34 2011 +0100
@@ -325,7 +325,7 @@ static void hpet_msi_ack(unsigned int ir
ack_APIC_irq();
}
-static void hpet_msi_end(unsigned int irq)
+static void hpet_msi_end(unsigned int irq, u8 vector)
{
}
diff -r 589651f9a8a6 xen/arch/x86/i8259.c
--- a/xen/arch/x86/i8259.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/i8259.c Fri Aug 26 19:08:34 2011 +0100
@@ -91,7 +91,7 @@ static unsigned int startup_8259A_irq(un
return 0; /* never anything pending */
}
-static void end_8259A_irq(unsigned int irq)
+static void end_8259A_irq(unsigned int irq, u8 vector)
{
if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)))
enable_8259A_irq(irq);
diff -r 589651f9a8a6 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/io_apic.c Fri Aug 26 19:08:34 2011 +0100
@@ -1657,7 +1657,7 @@ static void mask_and_ack_level_ioapic_ir
}
}
-static void end_level_ioapic_irq (unsigned int irq)
+static void end_level_ioapic_irq (unsigned int irq, u8 vector)
{
unsigned long v;
int i;
@@ -1706,6 +1706,14 @@ static void end_level_ioapic_irq (unsign
*/
i = IO_APIC_VECTOR(irq);
+ /* CA-65000 - manually EOI the old vector if we are moving to the new */
+ if ( vector && old != vector )
+ {
+ int ioapic;
+ for (ioapic = 0; ioapic < nr_ioapics; ioapic++)
+ io_apic_eoi(ioapic, old);
+ }
+
v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
ack_APIC_irq();
@@ -1729,7 +1737,7 @@ static void disable_edge_ioapic_irq(unsi
{
}
-static void end_edge_ioapic_irq(unsigned int irq)
+static void end_edge_ioapic_irq(unsigned int irq, u8 vector)
{
}
@@ -1780,7 +1788,7 @@ static void ack_msi_irq(unsigned int irq
ack_APIC_irq(); /* ACKTYPE_NONE */
}
-static void end_msi_irq(unsigned int irq)
+static void end_msi_irq(unsigned int irq, u8 vector)
{
if ( !msi_maskable_irq(irq_desc[irq].msi_desc) )
ack_APIC_irq(); /* ACKTYPE_EOI */
@@ -1833,7 +1841,7 @@ static void ack_lapic_irq(unsigned int i
ack_APIC_irq();
}
-static void end_lapic_irq(unsigned int irq) { /* nothing */ }
+static void end_lapic_irq(unsigned int irq, u8 vector) { /* nothing */ }
static hw_irq_controller lapic_irq_type = {
.typename = "local-APIC-edge",
diff -r 589651f9a8a6 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/irq.c Fri Aug 26 19:08:34 2011 +0100
@@ -349,6 +349,7 @@ static void __do_IRQ_guest(int vector);
void no_action(int cpl, void *dev_id, struct cpu_user_regs *regs) { }
static void enable_none(unsigned int vector) { }
+static void end_none(unsigned int irq, u8 vector) { }
static unsigned int startup_none(unsigned int vector) { return 0; }
static void disable_none(unsigned int vector) { }
static void ack_none(unsigned int irq)
@@ -357,7 +358,6 @@ static void ack_none(unsigned int irq)
}
#define shutdown_none disable_none
-#define end_none enable_none
hw_irq_controller no_irq_type = {
"none",
@@ -385,6 +385,7 @@ int __assign_irq_vector(int irq, struct
static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0;
unsigned int old_vector;
int cpu, err;
+ unsigned long flags;
cpumask_t tmp_mask;
/* If we already have a vector on one of the cpus in the mask,
@@ -437,6 +438,7 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
+ local_irq_save(flags);
if (old_vector) {
cfg->move_in_progress = 1;
cpus_copy(cfg->old_cpu_mask, cfg->cpu_mask);
@@ -467,6 +469,7 @@ next:
if (IO_APIC_IRQ(irq))
irq_vector[irq] = vector;
err = 0;
+ local_irq_restore(flags);
break;
}
return err;
@@ -674,7 +677,7 @@ asmlinkage void do_IRQ(struct cpu_user_r
desc->status &= ~IRQ_INPROGRESS;
out:
- desc->handler->end(irq);
+ desc->handler->end(irq, regs->entry_vector);
out_no_end:
spin_unlock(&desc->lock);
irq_exit();
@@ -890,7 +893,7 @@ static void irq_guest_eoi_timer_fn(void
switch ( action->ack_type )
{
case ACKTYPE_UNMASK:
- desc->handler->end(irq);
+ desc->handler->end(irq, 0);
break;
case ACKTYPE_EOI:
cpu_eoi_map = action->cpu_eoi_map;
@@ -921,7 +924,7 @@ static void __do_IRQ_guest(int irq)
/* An interrupt may slip through while freeing an ACKTYPE_EOI irq. */
ASSERT(action->ack_type == ACKTYPE_EOI);
ASSERT(desc->status & IRQ_DISABLED);
- desc->handler->end(irq);
+ desc->handler->end(irq, vector);
return;
}
@@ -1032,7 +1035,7 @@ static void flush_ready_eoi(void)
ASSERT(irq > 0);
desc = irq_to_desc(irq);
spin_lock(&desc->lock);
- desc->handler->end(irq);
+ desc->handler->end(irq, peoi[sp].vector);
spin_unlock(&desc->lock);
}
@@ -1113,7 +1116,7 @@ static void __pirq_guest_eoi(struct doma
if ( action->ack_type == ACKTYPE_UNMASK )
{
ASSERT(cpus_empty(action->cpu_eoi_map));
- desc->handler->end(irq);
+ desc->handler->end(irq, 0);
spin_unlock_irq(&desc->lock);
return;
}
@@ -1373,7 +1376,7 @@ static irq_guest_action_t *__pirq_guest_
case ACKTYPE_UNMASK:
if ( test_and_clear_bit(pirq, d->pirq_mask) &&
(--action->in_flight == 0) )
- desc->handler->end(irq);
+ desc->handler->end(irq, 0);
break;
case ACKTYPE_EOI:
/* NB. If #guests == 0 then we clear the eoi_map later on. */
diff -r 589651f9a8a6 xen/drivers/passthrough/amd/iommu_init.c
--- a/xen/drivers/passthrough/amd/iommu_init.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/drivers/passthrough/amd/iommu_init.c Fri Aug 26 19:08:34 2011 +0100
@@ -442,7 +442,7 @@ static unsigned int iommu_msi_startup(un
return 0;
}
-static void iommu_msi_end(unsigned int irq)
+static void iommu_msi_end(unsigned int irq, u8 vector)
{
iommu_msi_unmask(irq);
ack_APIC_irq();
diff -r 589651f9a8a6 xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/drivers/passthrough/vtd/iommu.c Fri Aug 26 19:08:34 2011 +0100
@@ -978,7 +978,7 @@ static unsigned int dma_msi_startup(unsi
return 0;
}
-static void dma_msi_end(unsigned int irq)
+static void dma_msi_end(unsigned int irq, u8 vector)
{
dma_msi_unmask(irq);
ack_APIC_irq();
diff -r 589651f9a8a6 xen/include/xen/irq.h
--- a/xen/include/xen/irq.h Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/include/xen/irq.h Fri Aug 26 19:08:34 2011 +0100
@@ -44,7 +44,7 @@ struct hw_interrupt_type {
void (*enable)(unsigned int irq);
void (*disable)(unsigned int irq);
void (*ack)(unsigned int irq);
- void (*end)(unsigned int irq);
+ void (*end)(unsigned int irq, u8 vector);
void (*set_affinity)(unsigned int irq, cpumask_t mask);
};
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next prev parent reply other threads:[~2011-08-26 18:16 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-11 13:59 megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Andreas Olsowski
2011-08-11 16:27 ` Simon Rowe
2011-08-11 22:51 ` Konrad Rzeszutek Wilk
2011-08-12 6:31 ` xen.frontend flag for higher display resolution (vnc) for HVM domU domains Mark Schneider
2011-08-12 7:26 ` Marc - A. Dahlhaus
2011-08-12 7:42 ` megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Simon Rowe
2011-08-12 9:11 ` Andreas Olsowski
2011-08-12 9:23 ` Simon Rowe
2011-08-15 10:49 ` Simon Rowe
2011-08-15 12:52 ` Andreas Olsowski
2011-08-19 12:28 ` Andrew Cooper
2011-08-19 14:17 ` Andreas Olsowski
2011-08-19 14:57 ` Andrew Cooper
2011-08-19 16:37 ` Andreas Olsowski
2011-08-19 16:49 ` Andrew Cooper
2011-08-19 18:10 ` Andreas Olsowski
2011-08-22 9:05 ` Andrew Cooper
2011-08-24 12:06 ` Andrew Cooper
2011-08-24 16:57 ` Andrew Cooper
2011-08-24 17:09 ` Konrad Rzeszutek Wilk
2011-08-24 17:20 ` Andrew Cooper
2011-08-26 18:16 ` Andrew Cooper [this message]
2011-08-26 18:32 ` Andrew Cooper
2011-08-30 12:02 ` Andreas Olsowski
2011-08-30 12:11 ` Andrew Cooper
2011-08-30 12:46 ` Keir Fraser
2011-08-12 9:02 ` Simon Rowe
2011-08-12 16:26 ` Pasi Kärkkäinen
2011-08-15 7:44 ` Simon Rowe
2011-08-12 16:25 ` Pasi Kärkkäinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E57E2EC.4090406@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=andreas.olsowski@leuphana.de \
--cc=keir.xen@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.