All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Add MSI support to XEN
@ 2008-03-27  6:55 Shan, Haitao
  2008-03-27  7:56 ` Keir Fraser
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Shan, Haitao @ 2008-03-27  6:55 UTC (permalink / raw)
  To: Keir Fraser, xen-devel; +Cc: Tian, Kevin, Jiang, Yunhong, Li, Xin B


[-- Attachment #1.1: Type: text/plain, Size: 3002 bytes --]

Hi, Keir,
 
    These patches are rebased version of Yunhong's original patches,
which were sent out before XEN 3.2 was released. These patches enable
MSI support and limited MSI-X support in XEN. Here is the original
description of the patches from Yunhong's mail.
 
The basic idea including:
1) Keep vector global resource owned by xen, while split pirq into
per-domain information.
2) Domain0 kernel will operate msi resource for domain0/domU, while QEMU
will operate MSI resource for HVM domain.
3) Xen will do EOI for MSI interrupt.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com
<mailto:yunhong.jiang@intel.com> >
 
    There are no much changes made compared with the original patches.
But there do have some issues that we need your kind comments.
    1> ACK-NEW method is necessary to avoid IRQ storm. But it causes the
deadlock. 
         During my tests, I do find there can be deadlock with patches
applied. When assigned a NIC device to HVM domain, the scenario is: Dom0
is waiting to IDE interrupt (vector 0x21); HVM domain is waiting for
qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector 0x31)
is waiting for injection to HVM domain since it is blocked now; IDE
interrupt is waiting for NIC interrupt since NIC interrupt is of high
priority but not ACKed by XEN now. When IDE interrupt and NIC interrupt
are delivered to the same CPU, and when guest OS is Vista, the
phenomenon is easy to be observed.
    2> Without ACK-NEW, some naughty NIC devices as we observed will
bring IRQ storms. For this phenomenon, I think Yunhong can comment more.
Basically, writing EOI without mask the source of MSI will bring IRQ
storm. Although the reason is under investigation, XEN should anyhow
handle such bogous device, right?
    3> Using ACK-OLD and masking the MSI when writing EOI can be
solution. However, XEN does not own PCI configuration spaces.
 
    We also tried some work arounds.
    One work around might be using a timer to force a EOI within some
time interval. This method is already implemented in VT-D's code.
However, with this approach, if the timer is fired and EOI is written,
this is essentially the same apporach as option 2.
    Another approach is to never deliver these two IRQs to the same CPU.
But this is really ugly and can not be applied to UP.
    We have also considered using VT-D 2 interrupt remapping feature.
According to the spec, there is no bit in the remapping table to mask
the interrupt. Therefore, this can not be combined with option 2 to
solve the issue. Masking the interrupt still needs accessing PCI
configuration spaces.
 
    We think the most clean method may be to move ownership from dom0 to
VMM. However, this is a great change. This should be well discussed in
community and need your comments.
 
    These patch series sent out can be served as a discussion materials.
What is your comments on the patches and the issues, Keir?
    
Thanks!
Haitao Shan
 

[-- Attachment #1.2: Type: text/html, Size: 12057 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-27  6:55 [PATCH 0/5] Add MSI support to XEN Shan, Haitao
@ 2008-03-27  7:56 ` Keir Fraser
  2008-03-27 17:32 ` Espen Skoglund
  2008-04-02 14:55 ` Neil Turton
  2 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2008-03-27  7:56 UTC (permalink / raw)
  To: Shan, Haitao, xen-devel; +Cc: Tian, Kevin, Jiang, Yunhong, Li, Xin B


[-- Attachment #1.1: Type: text/plain, Size: 3693 bytes --]

Thanks,

I¹ll have to look at the patches regarding the per-domain pirq changes. That
sounds like it probably makes sense, but I seem to remember there were big
changes to the irq architecture and irq naming in the hypervisor in previous
iterations of these patches, which I didn¹t understand.

This IRQ storm issue still needs properly resolving. Noone has yet explained
how a message-based interrupt source can cause an irq storm. Storms are
inherently a property of level-triggered sources, where ACK/EOI immediately
causes re-sampling of the interrupt line and re-assertion of the interrupt
at the CPU. How can anything similar happen with MSI? You (Intel) are
probably uniquely placed to answer this question, since you manufacture the
chipset and NIC which exhibit this problem.

 -- Keir

On 27/3/08 06:55, "Shan, Haitao" <haitao.shan@intel.com> wrote:

> The basic idea including:
> 1) Keep vector global resource owned by xen, while split pirq into per-domain
> information.
> 2) Domain0 kernel will operate msi resource for domain0/domU, while QEMU will
> operate MSI resource for HVM domain.
> 3) Xen will do EOI for MSI interrupt.
> 
> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com
> <mailto:yunhong.jiang@intel.com> >
>  
>     There are no much changes made compared with the original patches. But
> there do have some issues that we need your kind comments.
>     1> ACK-NEW method is necessary to avoid IRQ storm. But it causes the
> deadlock. 
>          During my tests, I do find there can be deadlock with patches
> applied. When assigned a NIC device to HVM domain, the scenario is: Dom0 is
> waiting to IDE interrupt (vector 0x21); HVM domain is waiting for qemu¹s IDE
> emulation and thus blocked; NIC interrupt (MSI vector 0x31) is waiting for
> injection to HVM domain since it is blocked now; IDE interrupt is waiting for
> NIC interrupt since NIC interrupt is of high priority but not ACKed by XEN
> now. When IDE interrupt and NIC interrupt are delivered to the same CPU, and
> when guest OS is Vista, the phenomenon is easy to be observed.
>     2> Without ACK-NEW, some naughty NIC devices as we observed will bring IRQ
> storms. For this phenomenon, I think Yunhong can comment more. Basically,
> writing EOI without mask the source of MSI will bring IRQ storm. Although the
> reason is under investigation, XEN should anyhow handle such bogous device,
> right?
>     3> Using ACK-OLD and masking the MSI when writing EOI can be solution.
> However, XEN does not own PCI configuration spaces.
>  
>     We also tried some work arounds.
>     One work around might be using a timer to force a EOI within some time
> interval. This method is already implemented in VT-D¹s code. However, with
> this approach, if the timer is fired and EOI is written, this is essentially
> the same apporach as option 2.
>     Another approach is to never deliver these two IRQs to the same CPU. But
> this is really ugly and can not be applied to UP.
>     We have also considered using VT-D 2 interrupt remapping feature.
> According to the spec, there is no bit in the remapping table to mask the
> interrupt. Therefore, this can not be combined with option 2 to solve the
> issue. Masking the interrupt still needs accessing PCI configuration spaces.
>  
>     We think the most clean method may be to move ownership from dom0 to VMM.
> However, this is a great change. This should be well discussed in community
> and need your comments.
>  
>     These patch series sent out can be served as a discussion materials. What
> is your comments on the patches and the issues, Keir?
>     
> Thanks!



[-- Attachment #1.2: Type: text/html, Size: 4565 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-27  6:55 [PATCH 0/5] Add MSI support to XEN Shan, Haitao
  2008-03-27  7:56 ` Keir Fraser
@ 2008-03-27 17:32 ` Espen Skoglund
  2008-03-27 22:09   ` Caitlin Bestler
  2008-03-28  1:48   ` Jiang, Yunhong
  2008-04-02 14:55 ` Neil Turton
  2 siblings, 2 replies; 22+ messages in thread
From: Espen Skoglund @ 2008-03-27 17:32 UTC (permalink / raw)
  To: Shan, Haitao
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin, Jiang, Yunhong

Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
capabilty structure or MSI-X table within the interrupt handler is
insane.  It requires accesses over the PCI/PCIe bus and is clearly
something you want to avoid on the fast path.

	eSk


[Haitao Shan]
>     There are no much changes made compared with the original patches.
> But there do have some issues that we need your kind comments.

>   1> ACK-NEW method is necessary to avoid IRQ storm. But it causes the
> deadlock. 
>          During my tests, I do find there can be deadlock with patches
> applied. When assigned a NIC device to HVM domain, the scenario is: Dom0
> is waiting to IDE interrupt (vector 0x21); HVM domain is waiting for
> qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector 0x31)
> is waiting for injection to HVM domain since it is blocked now; IDE
> interrupt is waiting for NIC interrupt since NIC interrupt is of high
> priority but not ACKed by XEN now. When IDE interrupt and NIC interrupt
> are delivered to the same CPU, and when guest OS is Vista, the
> phenomenon is easy to be observed.

>   2> Without ACK-NEW, some naughty NIC devices as we observed will
> bring IRQ storms. For this phenomenon, I think Yunhong can comment more.
> Basically, writing EOI without mask the source of MSI will bring IRQ
> storm. Although the reason is under investigation, XEN should anyhow
> handle such bogous device, right?

>   3> Using ACK-OLD and masking the MSI when writing EOI can be
> solution. However, XEN does not own PCI configuration spaces.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-03-27 17:32 ` Espen Skoglund
@ 2008-03-27 22:09   ` Caitlin Bestler
  2008-03-28  1:48   ` Jiang, Yunhong
  1 sibling, 0 replies; 22+ messages in thread
From: Caitlin Bestler @ 2008-03-27 22:09 UTC (permalink / raw)
  To: Espen Skoglund, Shan, Haitao
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin, Jiang, Yunhong

Espen Skoglund wrote:
> 
> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
> capabilty structure or MSI-X table within the interrupt handler is
> insane.  It requires accesses over the PCI/PCIe bus and is clearly
> something you want to avoid on the fast path.
> 
> 	eSk
> 

I agree. Interrupt mitigation schemes should already be part of the
host/device interface that is being assigned to the HVM guest. The
HVM guest should already know how to use it.

> 
> [Haitao Shan]
> >     There are no much changes made compared with the original
> patches.
> > But there do have some issues that we need your kind comments.
> 
> >   1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
> the
> > deadlock.
> >          During my tests, I do find there can be deadlock with
> patches
> > applied. When assigned a NIC device to HVM domain, the scenario is:
> Dom0
> > is waiting to IDE interrupt (vector 0x21); HVM domain is waiting for
> > qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector
> 0x31)
> > is waiting for injection to HVM domain since it is blocked now; IDE
> > interrupt is waiting for NIC interrupt since NIC interrupt is of
high
> > priority but not ACKed by XEN now. When IDE interrupt and NIC
> interrupt
> > are delivered to the same CPU, and when guest OS is Vista, the
> > phenomenon is easy to be observed.
> 
> >   2> Without ACK-NEW, some naughty NIC devices as we observed will
> > bring IRQ storms. For this phenomenon, I think Yunhong can comment
> more.
> > Basically, writing EOI without mask the source of MSI will bring IRQ
> > storm. Although the reason is under investigation, XEN should anyhow
> > handle such bogous device, right?
> 

Device assignment should deliver the device to the HVM, with all of its
warts as well as all of its features. Isn't the ultimate point is to use
the same driver in the HVM guest whether Xen is present or not?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-03-27 17:32 ` Espen Skoglund
  2008-03-27 22:09   ` Caitlin Bestler
@ 2008-03-28  1:48   ` Jiang, Yunhong
  2008-03-28  7:24     ` Keir Fraser
  1 sibling, 1 reply; 22+ messages in thread
From: Jiang, Yunhong @ 2008-03-28  1:48 UTC (permalink / raw)
  To: Espen Skoglund, Shan, Haitao
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin

Not masking each time when interrupt happen, instead, we do that only
when the second interrupt happen while the previous one is still
pending, it should be something like handle_edge_irqs() in upstream
linux.

-- Yunhong Jiang

Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
> capabilty structure or MSI-X table within the interrupt handler is
> insane.  It requires accesses over the PCI/PCIe bus and is clearly
> something you want to avoid on the fast path.
> 
> 	eSk
> 
> 
> [Haitao Shan]
>>     There are no much changes made compared with the original
patches.
>> But there do have some issues that we need your kind comments.
> 
>>   1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
the
>>          deadlock. During my tests, I do find there can be deadlock
with
>> patches applied. When assigned a NIC device to HVM domain, the
scenario
>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
waiting
>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector
0x31)
>> is waiting for injection to HVM domain since it is blocked now; IDE
>> interrupt is waiting for NIC interrupt since NIC interrupt is of high
>> priority but not ACKed by XEN now. When IDE interrupt and NIC
interrupt
>> are delivered to the same CPU, and when guest OS is Vista, the
>> phenomenon is easy to be observed.
> 
>>   2> Without ACK-NEW, some naughty NIC devices as we observed will
>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
more.
>> Basically, writing EOI without mask the source of MSI will bring IRQ
>> storm. Although the reason is under investigation, XEN should anyhow
>> handle such bogous device, right?
> 
>>   3> Using ACK-OLD and masking the MSI when writing EOI can be
>> solution. However, XEN does not own PCI configuration spaces.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-28  1:48   ` Jiang, Yunhong
@ 2008-03-28  7:24     ` Keir Fraser
  2008-03-28  8:40       ` Jiang, Yunhong
  2008-03-28 11:37       ` Espen Skoglund
  0 siblings, 2 replies; 22+ messages in thread
From: Keir Fraser @ 2008-03-28  7:24 UTC (permalink / raw)
  To: Jiang, Yunhong, Espen Skoglund, Shan, Haitao
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin

This requires the guest to call back into Xen to signal EOI (as we already
do for legacy level-triggered interrupts). We shouldn't really need to do
that for MSI and it's rather more expensive than a couple of accesses over
the PCI bus!

It's this callback into Xen, which we do not really understand why it's
needed, which I'm railing against. Is there some fundamental aspect of MSI
we do not understand, or are we working around one brain-dead or buggy
device?

 -- Keir

On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

> Not masking each time when interrupt happen, instead, we do that only
> when the second interrupt happen while the previous one is still
> pending, it should be something like handle_edge_irqs() in upstream
> linux.
> 
> -- Yunhong Jiang
> 
> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
>> capabilty structure or MSI-X table within the interrupt handler is
>> insane.  It requires accesses over the PCI/PCIe bus and is clearly
>> something you want to avoid on the fast path.
>> 
>> eSk
>> 
>> 
>> [Haitao Shan]
>>>     There are no much changes made compared with the original
> patches.
>>> But there do have some issues that we need your kind comments.
>> 
>>>   1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
> the
>>>          deadlock. During my tests, I do find there can be deadlock
> with
>>> patches applied. When assigned a NIC device to HVM domain, the
> scenario
>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
> waiting
>>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector
> 0x31)
>>> is waiting for injection to HVM domain since it is blocked now; IDE
>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high
>>> priority but not ACKed by XEN now. When IDE interrupt and NIC
> interrupt
>>> are delivered to the same CPU, and when guest OS is Vista, the
>>> phenomenon is easy to be observed.
>> 
>>>   2> Without ACK-NEW, some naughty NIC devices as we observed will
>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
> more.
>>> Basically, writing EOI without mask the source of MSI will bring IRQ
>>> storm. Although the reason is under investigation, XEN should anyhow
>>> handle such bogous device, right?
>> 
>>>   3> Using ACK-OLD and masking the MSI when writing EOI can be
>>> solution. However, XEN does not own PCI configuration spaces.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-03-28  7:24     ` Keir Fraser
@ 2008-03-28  8:40       ` Jiang, Yunhong
  2008-03-28  9:16         ` Keir Fraser
  2008-03-28 11:37       ` Espen Skoglund
  1 sibling, 1 reply; 22+ messages in thread
From: Jiang, Yunhong @ 2008-03-28  8:40 UTC (permalink / raw)
  To: Keir Fraser, Espen Skoglund, Shan, Haitao
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin

I'd give some experiement I did after I discovered this issue.

The device was a 82575EB NIC card, the driver I used was igb 1.0.8
(search http://sourceforge.net/project/showfiles.php?group_id=42302  for
it). 
LSC interrupt is a line status change interrupt. It can happen
physically , or it can be triggered as the driver did in  igb_open() in
igb_main.c line 1496, which write to a special register (E1000_ICS) to
trigger an interrupt event.

I did some experiemnt in linux 2.6.23 again with this driver, I try to
a) change the handle_edge_irqs() to mask/ack to only ack the interrupt
if the interrupt happen when the previous one is on way, see the patch
below, b) commented out line 1496 in the driver.

The investigation result is,
1) if mask and ack the interrupt, the interrupt will happen 3 times, the
last 2 is masked because they happened when the first one is still
pending for ISR's handler, the system is ok.
2) if ack and no-mask the interrupt, the interrupt will happen
continously, the system hang for ever.
3) if ack and no-mask the interrupt, and I remove line 1496 (i.e. no
software trigger interrupt), the intrrupt will happen twice, system is
ok.

So I suppose the problem happens only if trigger the interrupt by
software. I consulted the HW engineer also but didn't get confirmation,
the only answer I got is, the PCI-E need a rising edge before send the
2nd interrupt :(

I'm not sure if there are any other BRAIN-DEAD device like this, I only
have this device to test MSI-X function, but we may need make sure it
will not break the whole system.

The call-back to guest because we are using the ACK-new method to work
around this issue. Yes, it is expensive, Also, this ACK-new method may
cause deadlock as Haitao suggested in the mail.

But if we move the config space to HV, then we don't need this ACK-new
method, that should be ok, but admittedly, that should be the last
method we we turn to, since config-space should be owned by domain0.

Thanks
-- Yunhong Jiang

The patch to ack and no-mask the MSI-x interrupt is below:

--- kernel/irq/chip.c   2008-03-28 13:23:51.000000000 -0400
+++ ../linux-2.6.23/kernel/irq/chip.c   2007-10-09 16:31:38.000000000
-0400
@@ -439,9 +439,7 @@
  *     the handler was running. If all pending interrupts are handled,
the
  *     loop is left.
  */
-
-extern struct irq_chip msi_chip ;
-void
+void fastcall
 handle_edge_irq(unsigned int irq, struct irq_desc *desc)
 {
        const unsigned int cpu = smp_processor_id();
@@ -457,23 +455,11 @@
         */
        if (unlikely((desc->status & (IRQ_INPROGRESS | IRQ_DISABLED)) ||
                    !desc->action)) {
-
-        if (desc->chip == &msi_chip)
-            printk("mask msi chip irq %x cpu %x desc->status %x
desc->action %p tsc %lx\n", irq, cpu, desc->status, desc->action,
tsc_this);
-
                desc->status |= (IRQ_PENDING | IRQ_MASKED);
-        if (desc->chip == &msi_chip)
-        {
-               desc->chip->ack(irq);
-        }else
                mask_ack_irq(desc, irq);
-
                goto out_unlock;
        }


Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:
> This requires the guest to call back into Xen to signal EOI (as we
already
> do for legacy level-triggered interrupts). We shouldn't really
> need to do
> that for MSI and it's rather more expensive than a couple of
> accesses over
> the PCI bus!
> 
> It's this callback into Xen, which we do not really understand why
it's
> needed, which I'm railing against. Is there some fundamental
> aspect of MSI
> we do not understand, or are we working around one brain-dead or buggy
> device? 
> 
> -- Keir
> 
> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
> 
>> Not masking each time when interrupt happen, instead, we do that only
>> when the second interrupt happen while the previous one is still
>> pending, it should be something like handle_edge_irqs() in upstream
linux.
>> 
>> -- Yunhong Jiang
>> 
>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
>>> Preventing interrupt storms by masking the interrupt in the
MSI/MSI-X
>>> capabilty structure or MSI-X table within the interrupt handler is
>>> insane.  It requires accesses over the PCI/PCIe bus and is clearly
>>> something you want to avoid on the fast path.
>>> 
>>> eSk
>>> 
>>> 
>>> [Haitao Shan]
>>>>     There are no much changes made compared with the original
patches.
>>>> But there do have some issues that we need your kind comments.
>>> 
>>>>   1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
the
>>>>          deadlock. During my tests, I do find there can be deadlock
with
>>>> patches applied. When assigned a NIC device to HVM domain, the
scenario
>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
waiting
>>>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI
vector
>>>> 0x31) is waiting for injection to HVM domain since it is blocked
now; IDE
>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of
high
>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC
interrupt
>>>> are delivered to the same CPU, and when guest OS is Vista, the
>>>> phenomenon is easy to be observed.
>>> 
>>>>   2> Without ACK-NEW, some naughty NIC devices as we observed will
>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
more.
>>>> Basically, writing EOI without mask the source of MSI will bring
IRQ
>>>> storm. Although the reason is under investigation, XEN should
anyhow
>>>> handle such bogous device, right?
>>> 
>>>>   3> Using ACK-OLD and masking the MSI when writing EOI can be
>>>> solution. However, XEN does not own PCI configuration spaces.
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-28  8:40       ` Jiang, Yunhong
@ 2008-03-28  9:16         ` Keir Fraser
  2008-03-28  9:35           ` Jiang, Yunhong
  2008-03-31 13:57           ` Jiang, Yunhong
  0 siblings, 2 replies; 22+ messages in thread
From: Keir Fraser @ 2008-03-28  9:16 UTC (permalink / raw)
  To: Jiang, Yunhong, Espen Skoglund, Shan, Haitao
  Cc: Tian, Kevin, xen-devel, Li, Xin B

On 28/3/08 08:40, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

> The investigation result is,
> 1) if mask and ack the interrupt, the interrupt will happen 3 times, the
> last 2 is masked because they happened when the first one is still
> pending for ISR's handler, the system is ok.

How can you tell it happened three times? If the interrupt is pending in the
ISR then only one further pending interrupt can become visible to software
as there is only one pending bit per vector in the IRR.

> So I suppose the problem happens only if trigger the interrupt by
> software. I consulted the HW engineer also but didn't get confirmation,
> the only answer I got is, the PCI-E need a rising edge before send the
> 2nd interrupt :(

That answer means very little to me. One interesting question to have
answered would be: is this a closed-loop or open-loop interrupt storm? I.e.,
does the device somehow detect API EOI and then trigger re-send of the MSI
(closed loop) or is this an initialisation-time-only open-loop storm where
the device is spitting out the MSI regularly until some device register gets
written by the interrupt service routine?

Given the circumstances, I'm inclined to think it is the latter. Especially
since I think the former is impossible as EPIC EOI is not visible outside
the processor unless the interrupt came from a level-triggered IO-APIC pin,
and even then the EOI would not be visible across the PCI bus!

Also it seems *very* likely that this is just an initialisation-time thing,
and the device probably behaves very nicely after it is bootstrapped. In
light of this I think we should treat MSI sources as ACKTYPE_NONE in Xen
(i.e, require no callback from guest to hypervisor on completion of the
interrupt handler). We can then handle the interrupt storm entirely within
the hypervisor by detecting the storm and masking the interrupt and only
unmasking on some timeout.

In your tests, how aggressive was the IRQ storm? If you looked at the
interrupted EIP on each interrupt, was it immediately after the APIC was
EOIed and EFLAGS.IF set back to 1, or was it some time after? This tells us
how aggressively the device is sending out EOIs, and may determine how
cunning we need to be regarding interrupt storm detection.

> I'm not sure if there are any other BRAIN-DEAD device like this, I only
> have this device to test MSI-X function, but we may need make sure it
> will not break the whole system.

Yes, we have to handle this case, unfortunately.

> The call-back to guest because we are using the ACK-new method to work
> around this issue. Yes, it is expensive, Also, this ACK-new method may
> cause deadlock as Haitao suggested in the mail.

Yes, that sucks. See my previous email -- if possible it would be great to
teach Xen enough about the PCI config space to be able to mask MSIs.

> But if we move the config space to HV, then we don't need this ACK-new
> method, that should be ok, but admittedly, that should be the last
> method we we turn to, since config-space should be owned by domain0.

A partial movement into the hypervisor may be the best of a choice of evils.

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-03-28  9:16         ` Keir Fraser
@ 2008-03-28  9:35           ` Jiang, Yunhong
  2008-03-31 13:57           ` Jiang, Yunhong
  1 sibling, 0 replies; 22+ messages in thread
From: Jiang, Yunhong @ 2008-03-28  9:35 UTC (permalink / raw)
  To: Keir Fraser, Espen Skoglund, Shan, Haitao
  Cc: Tian, Kevin, xen-devel, Li, Xin B

xen-devel-bounces@lists.xensource.com <> wrote:
> On 28/3/08 08:40, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
> 
>> The investigation result is,
>> 1) if mask and ack the interrupt, the interrupt will happen 3 times,
the
>> last 2 is masked because they happened when the first one is still
>> pending for ISR's handler, the system is ok.
> 
> How can you tell it happened three times? If the interrupt is
> pending in the
> ISR then only one further pending interrupt can become visible
> to software
> as there is only one pending bit per vector in the IRR.

There are two type of msi interrupt, one for receive/transmit, one for
other (this is the one cuase storm). I add printk if interrupt happen
while previous is in progress. Then I added the print number and the
output in /prot/interrupt. The output in /prco/interrupt is only 1.

> 
>> So I suppose the problem happens only if trigger the interrupt by
>> software. I consulted the HW engineer also but didn't get
confirmation,
>> the only answer I got is, the PCI-E need a rising edge before send
the
>> 2nd interrupt :(
> 
> That answer means very little to me. One interesting question to have
> answered would be: is this a closed-loop or open-loop
> interrupt storm? I.e.,
> does the device somehow detect API EOI and then trigger
> re-send of the MSI
> (closed loop) or is this an initialisation-time-only open-loop
> storm where
> the device is spitting out the MSI regularly until some device
register gets
> written by the interrupt service routine?
> 
> Given the circumstances, I'm inclined to think it is the
> latter. Especially
> since I think the former is impossible as EPIC EOI is not
> visible outside
> the processor unless the interrupt came from a level-triggered
> IO-APIC pin,
> and even then the EOI would not be visible across the PCI bus!
> 
> Also it seems *very* likely that this is just an
> initialisation-time thing,
> and the device probably behaves very nicely after it is
> bootstrapped. In

I can't tell this becuase this interrupt didn't happen again after the
device is up. Maybe I can change the driver to do more experiement.

> light of this I think we should treat MSI sources as
> ACKTYPE_NONE in Xen
> (i.e, require no callback from guest to hypervisor on completion of
the
> interrupt handler). We can then handle the interrupt storm
> entirely within
> the hypervisor by detecting the storm and masking the
> interrupt and only
> unmasking on some timeout.
> 
> In your tests, how aggressive was the IRQ storm? If you looked at the
> interrupted EIP on each interrupt, was it immediately after
> the APIC was
> EOIed and EFLAGS.IF set back to 1, or was it some time after?
> This tells us
> how aggressively the device is sending out EOIs, and may determine how
> cunning we need to be regarding interrupt storm detection.

I will try that.

> 
>> I'm not sure if there are any other BRAIN-DEAD device like this, I
only
>> have this device to test MSI-X function, but we may need make sure it
>> will not break the whole system.
> 
> Yes, we have to handle this case, unfortunately.
> 
>> The call-back to guest because we are using the ACK-new method to
work
>> around this issue. Yes, it is expensive, Also, this ACK-new method
may
>> cause deadlock as Haitao suggested in the mail.
> 
> Yes, that sucks. See my previous email -- if possible it would
> be great to
> teach Xen enough about the PCI config space to be able to mask MSIs.
In fact, currently xen is already tryting to access config space,
althought that is a bug still currently. In vt-d, xen try to access FLR
directly :)

> 
>> But if we move the config space to HV, then we don't need this
ACK-new
>> method, that should be ok, but admittedly, that should be the last
>> method we we turn to, since config-space should be owned by domain0.
> 
> A partial movement into the hypervisor may be the best of a
> choice of evils.

Sure, we will do that! 

> -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-28  7:24     ` Keir Fraser
  2008-03-28  8:40       ` Jiang, Yunhong
@ 2008-03-28 11:37       ` Espen Skoglund
  2008-03-28 11:53         ` Keir Fraser
  1 sibling, 1 reply; 22+ messages in thread
From: Espen Skoglund @ 2008-03-28 11:37 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Tian, Kevin, xen-devel, Espen Skoglund, Jiang, Yunhong,
	Shan, Haitao, Keir Fraser, Li, Xin B

That is true.  I was quite puzzled with the requirement of the
callback into Xen myself.  In standard Linux MSI interrupts are
treated as edge triggered and are just acked in the local APIC upon
delivery.

	eSk



[Keir Fraser]
> This requires the guest to call back into Xen to signal EOI (as we already
> do for legacy level-triggered interrupts). We shouldn't really need to do
> that for MSI and it's rather more expensive than a couple of accesses over
> the PCI bus!

> It's this callback into Xen, which we do not really understand why it's
> needed, which I'm railing against. Is there some fundamental aspect of MSI
> we do not understand, or are we working around one brain-dead or buggy
> device?

>  -- Keir

> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

>> Not masking each time when interrupt happen, instead, we do that only
>> when the second interrupt happen while the previous one is still
>> pending, it should be something like handle_edge_irqs() in upstream
>> linux.
>> 
>> -- Yunhong Jiang
>> 
>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
>>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
>>> capabilty structure or MSI-X table within the interrupt handler is
>>> insane.  It requires accesses over the PCI/PCIe bus and is clearly
>>> something you want to avoid on the fast path.
>>> 
>>> eSk
>>> 
>>> 
>>> [Haitao Shan]
>>>> There are no much changes made compared with the original
>> patches.
>>>> But there do have some issues that we need your kind comments.
>>> 
1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
>> the
>>>> deadlock. During my tests, I do find there can be deadlock
>> with
>>>> patches applied. When assigned a NIC device to HVM domain, the
>> scenario
>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
>> waiting
>>>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector
>> 0x31)
>>>> is waiting for injection to HVM domain since it is blocked now; IDE
>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high
>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC
>> interrupt
>>>> are delivered to the same CPU, and when guest OS is Vista, the
>>>> phenomenon is easy to be observed.
>>> 
2> Without ACK-NEW, some naughty NIC devices as we observed will
>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
>> more.
>>>> Basically, writing EOI without mask the source of MSI will bring IRQ
>>>> storm. Although the reason is under investigation, XEN should anyhow
>>>> handle such bogous device, right?
>>> 
3> Using ACK-OLD and masking the MSI when writing EOI can be
>>>> solution. However, XEN does not own PCI configuration spaces.
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-28 11:37       ` Espen Skoglund
@ 2008-03-28 11:53         ` Keir Fraser
  2008-03-28 12:15           ` Espen Skoglund
  0 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2008-03-28 11:53 UTC (permalink / raw)
  To: Espen Skoglund
  Cc: Tian, Kevin, xen-devel, Jiang, Yunhong, Shan, Haitao, Keir Fraser,
	Li, Xin B

I think Linux EOIs on ->end() not on ->ack(). Which is fine since Linux
doesn't defer or otherwise schedule ISR handlers.

 -- Keir

On 28/3/08 11:37, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:

> That is true.  I was quite puzzled with the requirement of the
> callback into Xen myself.  In standard Linux MSI interrupts are
> treated as edge triggered and are just acked in the local APIC upon
> delivery.
> 
> eSk
> 
> 
> 
> [Keir Fraser]
>> This requires the guest to call back into Xen to signal EOI (as we already
>> do for legacy level-triggered interrupts). We shouldn't really need to do
>> that for MSI and it's rather more expensive than a couple of accesses over
>> the PCI bus!
> 
>> It's this callback into Xen, which we do not really understand why it's
>> needed, which I'm railing against. Is there some fundamental aspect of MSI
>> we do not understand, or are we working around one brain-dead or buggy
>> device?
> 
>>  -- Keir
> 
>> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
> 
>>> Not masking each time when interrupt happen, instead, we do that only
>>> when the second interrupt happen while the previous one is still
>>> pending, it should be something like handle_edge_irqs() in upstream
>>> linux.
>>> 
>>> -- Yunhong Jiang
>>> 
>>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
>>>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
>>>> capabilty structure or MSI-X table within the interrupt handler is
>>>> insane.  It requires accesses over the PCI/PCIe bus and is clearly
>>>> something you want to avoid on the fast path.
>>>> 
>>>> eSk
>>>> 
>>>> 
>>>> [Haitao Shan]
>>>>> There are no much changes made compared with the original
>>> patches.
>>>>> But there do have some issues that we need your kind comments.
>>>> 
> 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
>>> the
>>>>> deadlock. During my tests, I do find there can be deadlock
>>> with
>>>>> patches applied. When assigned a NIC device to HVM domain, the
>>> scenario
>>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
>>> waiting
>>>>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector
>>> 0x31)
>>>>> is waiting for injection to HVM domain since it is blocked now; IDE
>>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high
>>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC
>>> interrupt
>>>>> are delivered to the same CPU, and when guest OS is Vista, the
>>>>> phenomenon is easy to be observed.
>>>> 
> 2> Without ACK-NEW, some naughty NIC devices as we observed will
>>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
>>> more.
>>>>> Basically, writing EOI without mask the source of MSI will bring IRQ
>>>>> storm. Although the reason is under investigation, XEN should anyhow
>>>>> handle such bogous device, right?
>>>> 
> 3> Using ACK-OLD and masking the MSI when writing EOI can be
>>>>> solution. However, XEN does not own PCI configuration spaces.
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-28 11:53         ` Keir Fraser
@ 2008-03-28 12:15           ` Espen Skoglund
  2008-03-28 13:00             ` Keir Fraser
  0 siblings, 1 reply; 22+ messages in thread
From: Espen Skoglund @ 2008-03-28 12:15 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Tian, Kevin, xen-devel, Espen Skoglund, Jiang, Yunhong,
	Shan, Haitao, Keir Fraser, Li, Xin B

Just checked this.  Linux does the local APIC EOI on ->ack().

	eSk


[Keir Fraser]
> I think Linux EOIs on ->end() not on ->ack(). Which is fine since
> Linux doesn't defer or otherwise schedule ISR handlers.

>  -- Keir

> On 28/3/08 11:37, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:

>> That is true.  I was quite puzzled with the requirement of the
>> callback into Xen myself.  In standard Linux MSI interrupts are
>> treated as edge triggered and are just acked in the local APIC upon
>> delivery.
>> 
>> eSk
>> 
>> 
>> 
>> [Keir Fraser]
>>> This requires the guest to call back into Xen to signal EOI (as we already
>>> do for legacy level-triggered interrupts). We shouldn't really need to do
>>> that for MSI and it's rather more expensive than a couple of accesses over
>>> the PCI bus!
>> 
>>> It's this callback into Xen, which we do not really understand why it's
>>> needed, which I'm railing against. Is there some fundamental aspect of MSI
>>> we do not understand, or are we working around one brain-dead or buggy
>>> device?
>> 
>>> -- Keir
>> 
>>> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>> 
>>>> Not masking each time when interrupt happen, instead, we do that only
>>>> when the second interrupt happen while the previous one is still
>>>> pending, it should be something like handle_edge_irqs() in upstream
>>>> linux.
>>>> 
>>>> -- Yunhong Jiang
>>>> 
>>>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
>>>>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
>>>>> capabilty structure or MSI-X table within the interrupt handler is
>>>>> insane.  It requires accesses over the PCI/PCIe bus and is clearly
>>>>> something you want to avoid on the fast path.
>>>>> 
>>>>> eSk
>>>>> 
>>>>> 
>>>>> [Haitao Shan]
>>>>>> There are no much changes made compared with the original
>>>> patches.
>>>>>> But there do have some issues that we need your kind comments.
>>>>> 
1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
>>>> the
>>>>>> deadlock. During my tests, I do find there can be deadlock
>>>> with
>>>>>> patches applied. When assigned a NIC device to HVM domain, the
>>>> scenario
>>>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
>>>> waiting
>>>>>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector
>>>> 0x31)
>>>>>> is waiting for injection to HVM domain since it is blocked now; IDE
>>>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high
>>>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC
>>>> interrupt
>>>>>> are delivered to the same CPU, and when guest OS is Vista, the
>>>>>> phenomenon is easy to be observed.
>>>>> 
2> Without ACK-NEW, some naughty NIC devices as we observed will
>>>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
>>>> more.
>>>>>> Basically, writing EOI without mask the source of MSI will bring IRQ
>>>>>> storm. Although the reason is under investigation, XEN should anyhow
>>>>>> handle such bogous device, right?
>>>>> 
3> Using ACK-OLD and masking the MSI when writing EOI can be
>>>>>> solution. However, XEN does not own PCI configuration spaces.
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>> 
>> 
>> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-28 12:15           ` Espen Skoglund
@ 2008-03-28 13:00             ` Keir Fraser
  0 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2008-03-28 13:00 UTC (permalink / raw)
  To: Espen Skoglund
  Cc: Tian, Kevin, xen-devel, Jiang, Yunhong, Shan, Haitao, Keir Fraser,
	Li, Xin B

Oh yes, that is true. They then have special logic for detecting nested
delivery and mask/unmask in that case. Fair enough, and similar to what we
should do in Xen.

 -- Keir

On 28/3/08 12:15, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:

> Just checked this.  Linux does the local APIC EOI on ->ack().
> 
> eSk
> 
> 
> [Keir Fraser]
>> I think Linux EOIs on ->end() not on ->ack(). Which is fine since
>> Linux doesn't defer or otherwise schedule ISR handlers.
> 
>>  -- Keir
> 
>> On 28/3/08 11:37, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:
> 
>>> That is true.  I was quite puzzled with the requirement of the
>>> callback into Xen myself.  In standard Linux MSI interrupts are
>>> treated as edge triggered and are just acked in the local APIC upon
>>> delivery.
>>> 
>>> eSk
>>> 
>>> 
>>> 
>>> [Keir Fraser]
>>>> This requires the guest to call back into Xen to signal EOI (as we already
>>>> do for legacy level-triggered interrupts). We shouldn't really need to do
>>>> that for MSI and it's rather more expensive than a couple of accesses over
>>>> the PCI bus!
>>> 
>>>> It's this callback into Xen, which we do not really understand why it's
>>>> needed, which I'm railing against. Is there some fundamental aspect of MSI
>>>> we do not understand, or are we working around one brain-dead or buggy
>>>> device?
>>> 
>>>> -- Keir
>>> 
>>>> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>> 
>>>>> Not masking each time when interrupt happen, instead, we do that only
>>>>> when the second interrupt happen while the previous one is still
>>>>> pending, it should be something like handle_edge_irqs() in upstream
>>>>> linux.
>>>>> 
>>>>> -- Yunhong Jiang
>>>>> 
>>>>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
>>>>>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X
>>>>>> capabilty structure or MSI-X table within the interrupt handler is
>>>>>> insane.  It requires accesses over the PCI/PCIe bus and is clearly
>>>>>> something you want to avoid on the fast path.
>>>>>> 
>>>>>> eSk
>>>>>> 
>>>>>> 
>>>>>> [Haitao Shan]
>>>>>>> There are no much changes made compared with the original
>>>>> patches.
>>>>>>> But there do have some issues that we need your kind comments.
>>>>>> 
> 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes
>>>>> the
>>>>>>> deadlock. During my tests, I do find there can be deadlock
>>>>> with
>>>>>>> patches applied. When assigned a NIC device to HVM domain, the
>>>>> scenario
>>>>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is
>>>>> waiting
>>>>>>> for qemu's IDE emulation and thus blocked; NIC interrupt (MSI vector
>>>>> 0x31)
>>>>>>> is waiting for injection to HVM domain since it is blocked now; IDE
>>>>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high
>>>>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC
>>>>> interrupt
>>>>>>> are delivered to the same CPU, and when guest OS is Vista, the
>>>>>>> phenomenon is easy to be observed.
>>>>>> 
> 2> Without ACK-NEW, some naughty NIC devices as we observed will
>>>>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment
>>>>> more.
>>>>>>> Basically, writing EOI without mask the source of MSI will bring IRQ
>>>>>>> storm. Although the reason is under investigation, XEN should anyhow
>>>>>>> handle such bogous device, right?
>>>>>> 
> 3> Using ACK-OLD and masking the MSI when writing EOI can be
>>>>>>> solution. However, XEN does not own PCI configuration spaces.
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
>>> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-03-28  9:16         ` Keir Fraser
  2008-03-28  9:35           ` Jiang, Yunhong
@ 2008-03-31 13:57           ` Jiang, Yunhong
  2008-03-31 14:14             ` Keir Fraser
  1 sibling, 1 reply; 22+ messages in thread
From: Jiang, Yunhong @ 2008-03-31 13:57 UTC (permalink / raw)
  To: Jiang, Yunhong, Keir Fraser, Espen Skoglund, Shan, Haitao
  Cc: Tian, Kevin, xen-devel, Li, Xin B

Keir, when I try to get the ip address today, I suddenly found I can't
reproduce it anymore, also orginally if I removed the code that trigger
the software LSC interrupt, the NIC can still work and get IP address,
but now if I remove that code, the NIC can't work anymore. 
It is really strange to me, I did't change anything to the system. Also
I don't know any changes in the lab environment that may cause this
change. But I do can reproduce it before each time.

Really frustrated to get this :-( , do you think we still need move the
config space access down, now the only reasons to move this down is,
ack_edge_ioapic_irq() did the mask, and this mask can make HV more
robust.

Thanks
-- Yunhong Jiang


Jiang, Yunhong <> wrote:
> xen-devel-bounces@lists.xensource.com <> wrote:
>> On 28/3/08 08:40, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>> 
>>> The investigation result is,
>>> 1) if mask and ack the interrupt, the interrupt will happen 3 times,
the
>>> last 2 is masked because they happened when the first one is still
>>> pending for ISR's handler, the system is ok.
>> 
>> How can you tell it happened three times? If the interrupt is pending
in
>> the ISR then only one further pending interrupt can become visible
>> to software
>> as there is only one pending bit per vector in the IRR.
> 
> There are two type of msi interrupt, one for receive/transmit,
> one for other (this is the one cuase storm). I add printk if
> interrupt happen while previous is in progress. Then I added
> the print number and the output in /prot/interrupt. The output in
> /prco/interrupt is only 1. 
> 
>> 
>>> So I suppose the problem happens only if trigger the interrupt by
>>> software. I consulted the HW engineer also but didn't get
confirmation,
>>> the only answer I got is, the PCI-E need a rising edge before send
the
>>> 2nd interrupt :(
>> 
>> That answer means very little to me. One interesting question to have
>> answered would be: is this a closed-loop or open-loop
>> interrupt storm? I.e.,
>> does the device somehow detect API EOI and then trigger
>> re-send of the MSI
>> (closed loop) or is this an initialisation-time-only open-loop
>> storm where
>> the device is spitting out the MSI regularly until some device
register
>> gets written by the interrupt service routine?
>> 
>> Given the circumstances, I'm inclined to think it is the
>> latter. Especially
>> since I think the former is impossible as EPIC EOI is not
>> visible outside
>> the processor unless the interrupt came from a level-triggered
IO-APIC pin,
>> and even then the EOI would not be visible across the PCI bus!
>> 
>> Also it seems *very* likely that this is just an
>> initialisation-time thing,
>> and the device probably behaves very nicely after it is
>> bootstrapped. In
> 
> I can't tell this becuase this interrupt didn't happen again
> after the device is up. Maybe I can change the driver to do more
> experiement. 
> 
>> light of this I think we should treat MSI sources as
>> ACKTYPE_NONE in Xen
>> (i.e, require no callback from guest to hypervisor on completion of
the
>> interrupt handler). We can then handle the interrupt storm
>> entirely within
>> the hypervisor by detecting the storm and masking the
>> interrupt and only
>> unmasking on some timeout.
>> 
>> In your tests, how aggressive was the IRQ storm? If you looked at the
>> interrupted EIP on each interrupt, was it immediately after
>> the APIC was
>> EOIed and EFLAGS.IF set back to 1, or was it some time after?
>> This tells us
>> how aggressively the device is sending out EOIs, and may determine
how
>> cunning we need to be regarding interrupt storm detection.
> 
> I will try that.
> 
>> 
>>> I'm not sure if there are any other BRAIN-DEAD device like this, I
only
>>> have this device to test MSI-X function, but we may need make sure
it
>>> will not break the whole system.
>> 
>> Yes, we have to handle this case, unfortunately.
>> 
>>> The call-back to guest because we are using the ACK-new method to
work
>>> around this issue. Yes, it is expensive, Also, this ACK-new method
may
>>> cause deadlock as Haitao suggested in the mail.
>> 
>> Yes, that sucks. See my previous email -- if possible it would
>> be great to
>> teach Xen enough about the PCI config space to be able to mask MSIs.
> In fact, currently xen is already tryting to access config
> space, althought that is a bug still currently. In vt-d, xen try to
access
> FLR directly :) 
> 
>> 
>>> But if we move the config space to HV, then we don't need this
ACK-new
>>> method, that should be ok, but admittedly, that should be the last
>>> method we we turn to, since config-space should be owned by domain0.
>> 
>> A partial movement into the hypervisor may be the best of a
>> choice of evils.
> 
> Sure, we will do that!
> 
>> -- Keir
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-31 13:57           ` Jiang, Yunhong
@ 2008-03-31 14:14             ` Keir Fraser
  2008-03-31 14:15               ` Keir Fraser
  2008-03-31 14:25               ` Jiang, Yunhong
  0 siblings, 2 replies; 22+ messages in thread
From: Keir Fraser @ 2008-03-31 14:14 UTC (permalink / raw)
  To: Jiang, Yunhong, Espen Skoglund, Shan, Haitao
  Cc: Tian, Kevin, xen-devel, Li, Xin B

On 31/3/08 14:57, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

> Keir, when I try to get the ip address today, I suddenly found I can't
> reproduce it anymore, also orginally if I removed the code that trigger
> the software LSC interrupt, the NIC can still work and get IP address,
> but now if I remove that code, the NIC can't work anymore.
> It is really strange to me, I did't change anything to the system. Also
> I don't know any changes in the lab environment that may cause this
> change. But I do can reproduce it before each time.
> 
> Really frustrated to get this :-( , do you think we still need move the
> config space access down, now the only reasons to move this down is,
> ack_edge_ioapic_irq() did the mask, and this mask can make HV more
> robust.

So, if you leave the driver as it is (triggering the software LSC
interrupt), do APIC EOI in Xen before executing the interrupt handler in
dom0, and do not mask the MSI at all, then you no longer hang?

That's a weird change in behaviour if so!

I wonder whether there is a timing issue of some sort, and it depends if the
NIC generates the software-triggered interrupt at a fast enough rate that
the host CPU fails to make progress if it doesn't mask the MSI? You haven't
changed test machine at all, or put the NIC in a different PCI slot, or
anything like that?

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-31 14:14             ` Keir Fraser
@ 2008-03-31 14:15               ` Keir Fraser
  2008-03-31 14:25               ` Jiang, Yunhong
  1 sibling, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2008-03-31 14:15 UTC (permalink / raw)
  To: Jiang, Yunhong, Espen Skoglund, Shan, Haitao
  Cc: Tian, Kevin, xen-devel, Li, Xin B

On 31/3/08 15:14, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> I wonder whether there is a timing issue of some sort, and it depends if the
> NIC generates the software-triggered interrupt at a fast enough rate that the
> host CPU fails to make progress if it doesn't mask the MSI? You haven't
> changed test machine at all, or put the NIC in a different PCI slot, or
> anything like that?

Also, it's got to be worth kicking your hardware guys again and find out
from them exactly what happens when that software-triggered interrupt
register gets written by the device driver. Their previous response didn't
sound very enlightening.

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-03-31 14:14             ` Keir Fraser
  2008-03-31 14:15               ` Keir Fraser
@ 2008-03-31 14:25               ` Jiang, Yunhong
  2008-03-31 14:33                 ` Keir Fraser
  1 sibling, 1 reply; 22+ messages in thread
From: Jiang, Yunhong @ 2008-03-31 14:25 UTC (permalink / raw)
  To: Keir Fraser, Espen Skoglund, Shan, Haitao
  Cc: Tian, Kevin, xen-devel, Li, Xin B

Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:
> On 31/3/08 14:57, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
> 
>> Keir, when I try to get the ip address today, I suddenly found I
can't
>> reproduce it anymore, also orginally if I removed the code that
trigger
>> the software LSC interrupt, the NIC can still work and get IP
address,
>> but now if I remove that code, the NIC can't work anymore.
>> It is really strange to me, I did't change anything to the system.
Also
>> I don't know any changes in the lab environment that may cause this
>> change. But I do can reproduce it before each time.
>> 
>> Really frustrated to get this :-( , do you think we still need move
the
>> config space access down, now the only reasons to move this down is,
>> ack_edge_ioapic_irq() did the mask, and this mask can make HV more
>> robust.
> 
> So, if you leave the driver as it is (triggering the software LSC
> interrupt), do APIC EOI in Xen before executing the interrupt
> handler in
> dom0, and do not mask the MSI at all, then you no longer hang?

I usuually do experiement in linux kernel, and it no longer hang.

> 
> That's a weird change in behaviour if so!
> 
> I wonder whether there is a timing issue of some sort, and it
> depends if the
> NIC generates the software-triggered interrupt at a fast
> enough rate that
> the host CPU fails to make progress if it doesn't mask the
> MSI? You haven't
> changed test machine at all, or put the NIC in a different PCI slot,
or
> anything like that? 

I haven't change anything at all, the machine is on lab, which is far
away from my cub.  And I just stay at home at weekend.

> 
> -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-31 14:25               ` Jiang, Yunhong
@ 2008-03-31 14:33                 ` Keir Fraser
  2008-04-01  2:39                   ` Shan, Haitao
  0 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2008-03-31 14:33 UTC (permalink / raw)
  To: Jiang, Yunhong, Espen Skoglund, Shan, Haitao
  Cc: Tian, Kevin, xen-devel, Li, Xin B

On 31/3/08 15:25, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

>> So, if you leave the driver as it is (triggering the software LSC
>> interrupt), do APIC EOI in Xen before executing the interrupt
>> handler in
>> dom0, and do not mask the MSI at all, then you no longer hang?
> 
> I usuually do experiement in linux kernel, and it no longer hang.

Well, I'd be okay with an initial implementation which does not allow Xen to
mask MSIs. But still I think it will be cleaner and more extensible to have
Xen program the MSI registers anyway. This will hide details like interrupt
vector, APIC destination mode, etc. from the MSI-capable guest, and also
will make it easier to support things like changing interrupt affinity on
the fly (since it will not be necessary to get dom0 involved in that).

Once you have Xen able to write the MSI registers, I suppose it is not much
extra work to implement some kind of interrupt mitigation scheme involving
mask/enable bits of the MSI configuration register.

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-03-31 14:33                 ` Keir Fraser
@ 2008-04-01  2:39                   ` Shan, Haitao
  0 siblings, 0 replies; 22+ messages in thread
From: Shan, Haitao @ 2008-04-01  2:39 UTC (permalink / raw)
  To: Keir Fraser, Jiang, Yunhong, Espen Skoglund
  Cc: Tian, Kevin, xen-devel, Li, Xin B

Hi, Keir,

I am doing on that and incorporating your comments in. I will post the
updated patch after I finished. Thanks for your help!

Best Regards
Haitao Shan

Keir Fraser wrote:
> On 31/3/08 15:25, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
> 
>>> So, if you leave the driver as it is (triggering the software LSC
>>> interrupt), do APIC EOI in Xen before executing the interrupt
>>> handler in dom0, and do not mask the MSI at all, then you no longer
>>> hang? 
>> 
>> I usuually do experiement in linux kernel, and it no longer hang.
> 
> Well, I'd be okay with an initial implementation which does not allow
> Xen to mask MSIs. But still I think it will be cleaner and more
> extensible to have Xen program the MSI registers anyway. This will
> hide details like interrupt vector, APIC destination mode, etc. from
> the MSI-capable guest, and also will make it easier to support things
> like changing interrupt affinity on the fly (since it will not be
> necessary to get dom0 involved in that). 
> 
> Once you have Xen able to write the MSI registers, I suppose it is
> not much extra work to implement some kind of interrupt mitigation
> scheme involving mask/enable bits of the MSI configuration register.
> 
>  -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-03-27  6:55 [PATCH 0/5] Add MSI support to XEN Shan, Haitao
  2008-03-27  7:56 ` Keir Fraser
  2008-03-27 17:32 ` Espen Skoglund
@ 2008-04-02 14:55 ` Neil Turton
  2008-04-03 12:11   ` Shan, Haitao
  2 siblings, 1 reply; 22+ messages in thread
From: Neil Turton @ 2008-04-02 14:55 UTC (permalink / raw)
  To: Shan, Haitao
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin, Jiang, Yunhong

I tried this patch and MSI seems to work fine with a driver in DOM0.  It
didn't work with MSI-X though because pci_vector_resources returned 8
and I have 10 MSI capable devices in the machine.  I've only got 6
Phys-irq interrupts listed in /proc/interrupts so I'd expect there to be
more vectors free.  I applied the debugging patch below and got the
following output.

diff -r 9bb373519b68 arch/i386/pci/irq-xen.c
--- a/arch/i386/pci/irq-xen.c	Tue Apr 01 14:15:23 2008 +0100
+++ b/arch/i386/pci/irq-xen.c	Wed Apr 02 13:19:05 2008 +0100
@@ -1192,6 +1192,7 @@ int pci_vector_resources(int last, int n
 	int offset = (last % 8);

 	while (next < FIRST_SYSTEM_VECTOR) {
+		printk("next=%d count=%d\n", next, count);
 		next += 8;
 #ifdef CONFIG_X86_64
 		if (next == IA32_SYSCALL_VECTOR)

[pci_vector_resources(176, 1) called]
next=176 count=1
next=184 count=2
next=192 count=3
next=200 count=4
next=208 count=5
next=216 count=6
next=224 count=7
next=232 count=8
[pci_vector_resources returned 8]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/5] Add MSI support to XEN
  2008-04-02 14:55 ` Neil Turton
@ 2008-04-03 12:11   ` Shan, Haitao
  2008-04-03 12:31     ` Keir Fraser
  0 siblings, 1 reply; 22+ messages in thread
From: Shan, Haitao @ 2008-04-03 12:11 UTC (permalink / raw)
  To: Neil Turton
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin, Jiang, Yunhong

Hi, Neil

Thanks for trying the patches. The problem is caused by incompatibility between Xen and Dom0 kernel.
Pci_vector_resources is to calculate available vectors. Xen assigns vector by start with vector 0x20 and offset = 0. This will confuse the code in pci_vector_resources.
Maybe we should replace the function with a hypercall to acquire the number of available vectors. 
How do you think about it, Keir?
Thanks!

Shan Haitao

-----Original Message-----
From: Neil Turton [mailto:nturton@solarflare.com] 
Sent: 2008年4月2日 22:56
To: Shan, Haitao
Cc: Keir Fraser; xen-devel; Tian, Kevin; Jiang, Yunhong; Li, Xin B
Subject: Re: [Xen-devel] [PATCH 0/5] Add MSI support to XEN

I tried this patch and MSI seems to work fine with a driver in DOM0.  It
didn't work with MSI-X though because pci_vector_resources returned 8
and I have 10 MSI capable devices in the machine.  I've only got 6
Phys-irq interrupts listed in /proc/interrupts so I'd expect there to be
more vectors free.  I applied the debugging patch below and got the
following output.

diff -r 9bb373519b68 arch/i386/pci/irq-xen.c
--- a/arch/i386/pci/irq-xen.c	Tue Apr 01 14:15:23 2008 +0100
+++ b/arch/i386/pci/irq-xen.c	Wed Apr 02 13:19:05 2008 +0100
@@ -1192,6 +1192,7 @@ int pci_vector_resources(int last, int n
 	int offset = (last % 8);

 	while (next < FIRST_SYSTEM_VECTOR) {
+		printk("next=%d count=%d\n", next, count);
 		next += 8;
 #ifdef CONFIG_X86_64
 		if (next == IA32_SYSCALL_VECTOR)

[pci_vector_resources(176, 1) called]
next=176 count=1
next=184 count=2
next=192 count=3
next=200 count=4
next=208 count=5
next=216 count=6
next=224 count=7
next=232 count=8
[pci_vector_resources returned 8]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Add MSI support to XEN
  2008-04-03 12:11   ` Shan, Haitao
@ 2008-04-03 12:31     ` Keir Fraser
  0 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2008-04-03 12:31 UTC (permalink / raw)
  To: Shan, Haitao, Neil Turton
  Cc: Keir Fraser, xen-devel, Li, Xin B, Tian, Kevin, Jiang, Yunhong

On 3/4/08 13:11, "Shan, Haitao" <haitao.shan@intel.com> wrote:

> Thanks for trying the patches. The problem is caused by incompatibility
> between Xen and Dom0 kernel.
> Pci_vector_resources is to calculate available vectors. Xen assigns vector by
> start with vector 0x20 and offset = 0. This will confuse the code in
> pci_vector_resources.
> Maybe we should replace the function with a hypercall to acquire the number of
> available vectors.
> How do you think about it, Keir?

I may not understand the issue here, but in principle I do not particularly
want to have anything outside Xen handling real IRQ vectors. In which case
this confusion should not exist in the first place? I know the last round of
patches did have dom0 poking the MSI registers, and hence it knew about real
vectors, but that's being changed in the next round, right?

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-04-03 12:31 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-27  6:55 [PATCH 0/5] Add MSI support to XEN Shan, Haitao
2008-03-27  7:56 ` Keir Fraser
2008-03-27 17:32 ` Espen Skoglund
2008-03-27 22:09   ` Caitlin Bestler
2008-03-28  1:48   ` Jiang, Yunhong
2008-03-28  7:24     ` Keir Fraser
2008-03-28  8:40       ` Jiang, Yunhong
2008-03-28  9:16         ` Keir Fraser
2008-03-28  9:35           ` Jiang, Yunhong
2008-03-31 13:57           ` Jiang, Yunhong
2008-03-31 14:14             ` Keir Fraser
2008-03-31 14:15               ` Keir Fraser
2008-03-31 14:25               ` Jiang, Yunhong
2008-03-31 14:33                 ` Keir Fraser
2008-04-01  2:39                   ` Shan, Haitao
2008-03-28 11:37       ` Espen Skoglund
2008-03-28 11:53         ` Keir Fraser
2008-03-28 12:15           ` Espen Skoglund
2008-03-28 13:00             ` Keir Fraser
2008-04-02 14:55 ` Neil Turton
2008-04-03 12:11   ` Shan, Haitao
2008-04-03 12:31     ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.