From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Andreas Olsowski <andreas.olsowski@leuphana.de>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Fraser <keir.xen@gmail.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: megasas stops I/O when running kernel as dom0 under xen4.1/4.2
Date: Fri, 26 Aug 2011 19:16:12 +0100 [thread overview]
Message-ID: <4E57E2EC.4090406@citrix.com> (raw)
In-Reply-To: <4E5532C3.8090601@citrix.com>
[-- Attachment #1: Type: text/plain, Size: 3156 bytes --]
On 24/08/11 18:20, Andrew Cooper wrote:
>
> On 24/08/11 18:09, Konrad Rzeszutek Wilk wrote:
>> On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote:
>>> On 24/08/11 13:06, Andrew Cooper wrote:
>>>> On 22/08/11 10:05, Andrew Cooper wrote:
>>>>> On 19/08/11 19:10, Andreas Olsowski wrote:
>>>>>> Am 19.08.2011 18:49, schrieb Andrew Cooper:
>>>>>>
>>>>>>> The only change you need to make is in megasas_probe_one() in
>>>>>>> megaraid_sas_base.c
>>>>>>>
>>>>>>> Add a call to pci_enable_msi(pdev) immediately after the current
>>>>>> call to
>>>>>>> pci_set_master(pdev);
>>>>>>>
>>>>>>> ~Andrew
>>>>>>>
>>>>>> Yep, that works fine. Removed the module option as well.
>>>>>>
>>>>>> root@tarballerina:~# cat /proc/interrupts |grep mega
>>>>>> 2236: 69010 0 0 0 0
>>>>>> 0 0 0 xen-pirq-msi megasas
>>>>>>
>>>>>> The same procedure that would have lead to almost instant errors has
>>>>>> not brought them to appear again.
>>>>>>
>>>>> Good. This is what we are seeing as well. I am still awaiting a reply
>>>>> from LSI on this topic.
>>>>>
>>>>> Unfortunately, this does point to a regression in the way Xen deals with
>>>>> legacy interrupts.
>>>> Out of interest, on all 3 of your boxes with the megaraid_sas cards,
>>>> could you gather the io_apic information?
>>>>
>>>> It is the z xen debug key on the serial console (or alternatively put
>>>> apic_verbosity=debug on the xen commandline and the information gets
>>>> dumped into the dmesg)
>>> You can ignore this - it is not relevant.
>>>
>>> I have narrowed the problem to a bug in the interrupt migration code.
>> Goodies!
>>> The bug occurs when the move pending flag is set, and somehow another
>>> interrupt comes in on the old pcpu without triggering the move
>>> completion code. This leaves the IO_APIC with ack'd but not EOI'd
>>> interrupt from the megaraid_sas device.
>> Ah, so the interrupt is delievered to Dom0 on the old per_cpu
>> event which is ignored. Ignored b/c we have rebinded the event channel
>> to the other CPU, right?
> The interrupt is not ignored - it seems to be being serviced by the
> device driver in dom0. I will admit that my debugging code may be a
> bit flaky - I started by trying to match IRQ35 (which is always claimed
> by PCI INTA on this server - very useful for debugging) between do_IRQ
> and its related PHYSDEVOP_eoi.
>
> I am currently trying to track the exact order of events around this
> interrupt which misses the move completion code.
>
>> Is there any code in the Hypervisor to turn off interrupt migration code?
> Not that I have found, although playing around with vcpu and task
> pinning should work. My debugging shows that Xen-4.1.1 is migrating
> this interrupt between PCPUs on average once every 4 real interrupts
> when dom0 is under any load whatsoever.
>
Please try attached patch. It is a hack, but it works as far as I can test.
(Patch is taken against xen-4.1.1 but should be trivial to port if it
doesn't apply cleanly)
~Andrew
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
[-- Attachment #2: debug-wip.patch --]
[-- Type: text/x-patch, Size: 7214 bytes --]
# HG changeset patch
# Parent 589651f9a8a62fa25dc062b5b23a0ce947a259a5
diff -r 589651f9a8a6 xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/hpet.c Fri Aug 26 19:08:34 2011 +0100
@@ -325,7 +325,7 @@ static void hpet_msi_ack(unsigned int ir
ack_APIC_irq();
}
-static void hpet_msi_end(unsigned int irq)
+static void hpet_msi_end(unsigned int irq, u8 vector)
{
}
diff -r 589651f9a8a6 xen/arch/x86/i8259.c
--- a/xen/arch/x86/i8259.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/i8259.c Fri Aug 26 19:08:34 2011 +0100
@@ -91,7 +91,7 @@ static unsigned int startup_8259A_irq(un
return 0; /* never anything pending */
}
-static void end_8259A_irq(unsigned int irq)
+static void end_8259A_irq(unsigned int irq, u8 vector)
{
if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)))
enable_8259A_irq(irq);
diff -r 589651f9a8a6 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/io_apic.c Fri Aug 26 19:08:34 2011 +0100
@@ -1657,7 +1657,7 @@ static void mask_and_ack_level_ioapic_ir
}
}
-static void end_level_ioapic_irq (unsigned int irq)
+static void end_level_ioapic_irq (unsigned int irq, u8 vector)
{
unsigned long v;
int i;
@@ -1706,6 +1706,14 @@ static void end_level_ioapic_irq (unsign
*/
i = IO_APIC_VECTOR(irq);
+ /* CA-65000 - manually EOI the old vector if we are moving to the new */
+ if ( vector && old != vector )
+ {
+ int ioapic;
+ for (ioapic = 0; ioapic < nr_ioapics; ioapic++)
+ io_apic_eoi(ioapic, old);
+ }
+
v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
ack_APIC_irq();
@@ -1729,7 +1737,7 @@ static void disable_edge_ioapic_irq(unsi
{
}
-static void end_edge_ioapic_irq(unsigned int irq)
+static void end_edge_ioapic_irq(unsigned int irq, u8 vector)
{
}
@@ -1780,7 +1788,7 @@ static void ack_msi_irq(unsigned int irq
ack_APIC_irq(); /* ACKTYPE_NONE */
}
-static void end_msi_irq(unsigned int irq)
+static void end_msi_irq(unsigned int irq, u8 vector)
{
if ( !msi_maskable_irq(irq_desc[irq].msi_desc) )
ack_APIC_irq(); /* ACKTYPE_EOI */
@@ -1833,7 +1841,7 @@ static void ack_lapic_irq(unsigned int i
ack_APIC_irq();
}
-static void end_lapic_irq(unsigned int irq) { /* nothing */ }
+static void end_lapic_irq(unsigned int irq, u8 vector) { /* nothing */ }
static hw_irq_controller lapic_irq_type = {
.typename = "local-APIC-edge",
diff -r 589651f9a8a6 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/arch/x86/irq.c Fri Aug 26 19:08:34 2011 +0100
@@ -349,6 +349,7 @@ static void __do_IRQ_guest(int vector);
void no_action(int cpl, void *dev_id, struct cpu_user_regs *regs) { }
static void enable_none(unsigned int vector) { }
+static void end_none(unsigned int irq, u8 vector) { }
static unsigned int startup_none(unsigned int vector) { return 0; }
static void disable_none(unsigned int vector) { }
static void ack_none(unsigned int irq)
@@ -357,7 +358,6 @@ static void ack_none(unsigned int irq)
}
#define shutdown_none disable_none
-#define end_none enable_none
hw_irq_controller no_irq_type = {
"none",
@@ -385,6 +385,7 @@ int __assign_irq_vector(int irq, struct
static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0;
unsigned int old_vector;
int cpu, err;
+ unsigned long flags;
cpumask_t tmp_mask;
/* If we already have a vector on one of the cpus in the mask,
@@ -437,6 +438,7 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
+ local_irq_save(flags);
if (old_vector) {
cfg->move_in_progress = 1;
cpus_copy(cfg->old_cpu_mask, cfg->cpu_mask);
@@ -467,6 +469,7 @@ next:
if (IO_APIC_IRQ(irq))
irq_vector[irq] = vector;
err = 0;
+ local_irq_restore(flags);
break;
}
return err;
@@ -674,7 +677,7 @@ asmlinkage void do_IRQ(struct cpu_user_r
desc->status &= ~IRQ_INPROGRESS;
out:
- desc->handler->end(irq);
+ desc->handler->end(irq, regs->entry_vector);
out_no_end:
spin_unlock(&desc->lock);
irq_exit();
@@ -890,7 +893,7 @@ static void irq_guest_eoi_timer_fn(void
switch ( action->ack_type )
{
case ACKTYPE_UNMASK:
- desc->handler->end(irq);
+ desc->handler->end(irq, 0);
break;
case ACKTYPE_EOI:
cpu_eoi_map = action->cpu_eoi_map;
@@ -921,7 +924,7 @@ static void __do_IRQ_guest(int irq)
/* An interrupt may slip through while freeing an ACKTYPE_EOI irq. */
ASSERT(action->ack_type == ACKTYPE_EOI);
ASSERT(desc->status & IRQ_DISABLED);
- desc->handler->end(irq);
+ desc->handler->end(irq, vector);
return;
}
@@ -1032,7 +1035,7 @@ static void flush_ready_eoi(void)
ASSERT(irq > 0);
desc = irq_to_desc(irq);
spin_lock(&desc->lock);
- desc->handler->end(irq);
+ desc->handler->end(irq, peoi[sp].vector);
spin_unlock(&desc->lock);
}
@@ -1113,7 +1116,7 @@ static void __pirq_guest_eoi(struct doma
if ( action->ack_type == ACKTYPE_UNMASK )
{
ASSERT(cpus_empty(action->cpu_eoi_map));
- desc->handler->end(irq);
+ desc->handler->end(irq, 0);
spin_unlock_irq(&desc->lock);
return;
}
@@ -1373,7 +1376,7 @@ static irq_guest_action_t *__pirq_guest_
case ACKTYPE_UNMASK:
if ( test_and_clear_bit(pirq, d->pirq_mask) &&
(--action->in_flight == 0) )
- desc->handler->end(irq);
+ desc->handler->end(irq, 0);
break;
case ACKTYPE_EOI:
/* NB. If #guests == 0 then we clear the eoi_map later on. */
diff -r 589651f9a8a6 xen/drivers/passthrough/amd/iommu_init.c
--- a/xen/drivers/passthrough/amd/iommu_init.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/drivers/passthrough/amd/iommu_init.c Fri Aug 26 19:08:34 2011 +0100
@@ -442,7 +442,7 @@ static unsigned int iommu_msi_startup(un
return 0;
}
-static void iommu_msi_end(unsigned int irq)
+static void iommu_msi_end(unsigned int irq, u8 vector)
{
iommu_msi_unmask(irq);
ack_APIC_irq();
diff -r 589651f9a8a6 xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/drivers/passthrough/vtd/iommu.c Fri Aug 26 19:08:34 2011 +0100
@@ -978,7 +978,7 @@ static unsigned int dma_msi_startup(unsi
return 0;
}
-static void dma_msi_end(unsigned int irq)
+static void dma_msi_end(unsigned int irq, u8 vector)
{
dma_msi_unmask(irq);
ack_APIC_irq();
diff -r 589651f9a8a6 xen/include/xen/irq.h
--- a/xen/include/xen/irq.h Thu Aug 25 17:50:48 2011 +0100
+++ b/xen/include/xen/irq.h Fri Aug 26 19:08:34 2011 +0100
@@ -44,7 +44,7 @@ struct hw_interrupt_type {
void (*enable)(unsigned int irq);
void (*disable)(unsigned int irq);
void (*ack)(unsigned int irq);
- void (*end)(unsigned int irq);
+ void (*end)(unsigned int irq, u8 vector);
void (*set_affinity)(unsigned int irq, cpumask_t mask);
};
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next prev parent reply other threads:[~2011-08-26 18:16 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-11 13:59 megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Andreas Olsowski
2011-08-11 16:27 ` Simon Rowe
2011-08-11 22:51 ` Konrad Rzeszutek Wilk
2011-08-12 6:31 ` xen.frontend flag for higher display resolution (vnc) for HVM domU domains Mark Schneider
2011-08-12 7:26 ` Marc - A. Dahlhaus
2011-08-12 7:42 ` megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Simon Rowe
2011-08-12 9:11 ` Andreas Olsowski
2011-08-12 9:23 ` Simon Rowe
2011-08-15 10:49 ` Simon Rowe
2011-08-15 12:52 ` Andreas Olsowski
2011-08-19 12:28 ` Andrew Cooper
2011-08-19 14:17 ` Andreas Olsowski
2011-08-19 14:57 ` Andrew Cooper
2011-08-19 16:37 ` Andreas Olsowski
2011-08-19 16:49 ` Andrew Cooper
2011-08-19 18:10 ` Andreas Olsowski
2011-08-22 9:05 ` Andrew Cooper
2011-08-24 12:06 ` Andrew Cooper
2011-08-24 16:57 ` Andrew Cooper
2011-08-24 17:09 ` Konrad Rzeszutek Wilk
2011-08-24 17:20 ` Andrew Cooper
2011-08-26 18:16 ` Andrew Cooper [this message]
2011-08-26 18:32 ` Andrew Cooper
2011-08-30 12:02 ` Andreas Olsowski
2011-08-30 12:11 ` Andrew Cooper
2011-08-30 12:46 ` Keir Fraser
2011-08-12 9:02 ` Simon Rowe
2011-08-12 16:26 ` Pasi Kärkkäinen
2011-08-15 7:44 ` Simon Rowe
2011-08-12 16:25 ` Pasi Kärkkäinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E57E2EC.4090406@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=andreas.olsowski@leuphana.de \
--cc=keir.xen@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).