[Qemu-devel] [question] e1000 interrupt storm happened because of its corresponding ioapic->irr bit always set

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [question] e1000 interrupt storm happened because of its corresponding ioapic->irr bit always set
@ 2014-08-23 10:36 Zhang Haoyu
  2014-08-25  3:07 ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-23 10:36 UTC (permalink / raw)
  To: qemu-devel, kvm

Hi, all

I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().

Any ideas?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened because of its corresponding ioapic->irr bit always set
  2014-08-23 10:36 [Qemu-devel] [question] e1000 interrupt storm happened because of its corresponding ioapic->irr bit always set Zhang Haoyu
@ 2014-08-25  3:07 ` Jason Wang
  2014-08-25  7:17   ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof " Zhang Haoyu
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2014-08-25  3:07 UTC (permalink / raw)
  To: Zhang Haoyu, qemu-devel, kvm

On 08/23/2014 06:36 PM, Zhang Haoyu wrote:
> Hi, all
>
> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>
> Any ideas?

We meet this several times: search the autoneg patches for an example of
workaround for this in qemu, and patch kvm: ioapic: conditionally delay
irq delivery during eoi broadcast for an workaround in kvm (rejected).

That was probably caused by something wrong in e1000 emulation which
causes interrupt to be injected into windows guest before its interrupt
handler is registered. And Windows guest does not have a mechanism to
detect and disable irq in such condition.

e1000 emulation is far from stable and complete (e.g run e1000 ethtool
selftest in linux guest may see lots of errors). It's complicate and
subtle (even has undocumented registers and behaviour). You should
better consider to use virtio which are more stable and fast in a kvm
guest (unless some intel guys are involved to improve e1000 emulation).

Thanks
>
> Thanks,
> Zhang Haoyu
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseof its corresponding ioapic->irr bit always set
  2014-08-25  3:07 ` Jason Wang
@ 2014-08-25  7:17   ` Zhang Haoyu
  2014-08-25  7:29     ` Jason Wang
  2014-08-25  7:32     ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof its corresponding ioapic->irr " Jason Wang
  0 siblings, 2 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-25  7:17 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, kvm

>> Hi, all
>>
>> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>
>> Any ideas?
>
>We meet this several times: search the autoneg patches for an example of
>workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>irq delivery during eoi broadcast for an workaround in kvm (rejected).
>
Thanks, Jason,
I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
which one tries to fix this problem, or all of them?

>That was probably caused by something wrong in e1000 emulation which
>causes interrupt to be injected into windows guest before its interrupt
>handler is registered. And Windows guest does not have a mechanism to
>detect and disable irq in such condition.
>
Sorry, I don't understand,
I think one interrupt should not been enabled before its handler is successfully registered, 
is it possible that e1000 emulation inject the interrupt before the interrupt is succesfully enabled?

Thanks,
Zhang Haoyu
 
>e1000 emulation is far from stable and complete (e.g run e1000 ethtool
>selftest in linux guest may see lots of errors). It's complicate and
>subtle (even has undocumented registers and behaviour). You should
>better consider to use virtio which are more stable and fast in a kvm
>guest (unless some intel guys are involved to improve e1000 emulation).
>
>Thanks
>>
>> Thanks,
>> Zhang Haoyu
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseof its corresponding ioapic->irr bit always set
  2014-08-25  7:17   ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof " Zhang Haoyu
@ 2014-08-25  7:29     ` Jason Wang
  2014-08-25  8:27       ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof its correspondingioapic->irr " Zhang Haoyu
  2014-08-26  9:28       ` Zhang Haoyu
  2014-08-25  7:32     ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof its corresponding ioapic->irr " Jason Wang
  1 sibling, 2 replies; 18+ messages in thread
From: Jason Wang @ 2014-08-25  7:29 UTC (permalink / raw)
  To: Zhang Haoyu, qemu-devel, kvm

On 08/25/2014 03:17 PM, Zhang Haoyu wrote:
>>> Hi, all
>>>
>>> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>>> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>>
>>> Any ideas?
>> We meet this several times: search the autoneg patches for an example of
>> workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>> irq delivery during eoi broadcast for an workaround in kvm (rejected).
>>
> Thanks, Jason,
> I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
> http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007

This series is the first try to fix the guest hang during guest
hibernation or driver enable/disable.
> http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
> http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351

Those are follow-up that tries to fix the bugs introduced by the autoneg
hack.
> which one tries to fix this problem, or all of them?

As you can see, those kinds of hacking may not as good as we expect
since we don't know exactly how e1000 works. Only the register function
description from Intel's manual may not be sufficient. And you can
search e1000 in the archives and you can find some behaviour of e1000
registers were not fictionalized like what spec said. It was really
suggested to use virtio-net instead of e1000 in guest. 
>
>> That was probably caused by something wrong in e1000 emulation which
>> causes interrupt to be injected into windows guest before its interrupt
>> handler is registered. And Windows guest does not have a mechanism to
>> detect and disable irq in such condition.
>>
> Sorry, I don't understand,
> I think one interrupt should not been enabled before its handler is successfully registered, 
> is it possible that e1000 emulation inject the interrupt before the interrupt is succesfully enabled?
>
> Thanks,
> Zhang Haoyu
>  
>> e1000 emulation is far from stable and complete (e.g run e1000 ethtool
>> selftest in linux guest may see lots of errors). It's complicate and
>> subtle (even has undocumented registers and behaviour). You should
>> better consider to use virtio which are more stable and fast in a kvm
>> guest (unless some intel guys are involved to improve e1000 emulation).
>>
>> Thanks
>>> Thanks,
>>> Zhang Haoyu
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseof its correspondingioapic->irr bit always set
  2014-08-25  7:29     ` Jason Wang
@ 2014-08-25  8:27       ` Zhang Haoyu
  2014-08-26  9:28       ` Zhang Haoyu
  1 sibling, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-25  8:27 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, kvm

>>>> Hi, all
>>>>
>>>> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>>>> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>>>
>>>> Any ideas?
>>> We meet this several times: search the autoneg patches for an example of
>>> workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>>> irq delivery during eoi broadcast for an workaround in kvm (rejected).
>>>
>> Thanks, Jason,
>> I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
>> http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
>
>This series is the first try to fix the guest hang during guest
>hibernation or driver enable/disable.
>> http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
>> http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
>
>Those are follow-up that tries to fix the bugs introduced by the autoneg
>hack.
>> which one tries to fix this problem, or all of them?
>
>As you can see, those kinds of hacking may not as good as we expect
>since we don't know exactly how e1000 works. Only the register function
>description from Intel's manual may not be sufficient. And you can
>search e1000 in the archives and you can find some behaviour of e1000
>registers were not fictionalized like what spec said. It was really
>suggested to use virtio-net instead of e1000 in guest. 

We support both, virtio-net is the recommended option, 
with regard to some guest (e.g., windows server 2000), virtio-net is not supported, e1000 is the last option.
 
>>
>>> That was probably caused by something wrong in e1000 emulation which
>>> causes interrupt to be injected into windows guest before its interrupt
>>> handler is registered. And Windows guest does not have a mechanism to
>>> detect and disable irq in such condition.
>>>
>> Sorry, I don't understand,
>> I think one interrupt should not been enabled before its handler is successfully registered, 
>> is it possible that e1000 emulation inject the interrupt before the interrupt is succesfully enabled?
>>
>> Thanks,
>> Zhang Haoyu
>>  
>>> e1000 emulation is far from stable and complete (e.g run e1000 ethtool
>>> selftest in linux guest may see lots of errors). It's complicate and
>>> subtle (even has undocumented registers and behaviour). You should
>>> better consider to use virtio which are more stable and fast in a kvm
>>> guest (unless some intel guys are involved to improve e1000 emulation).
>>>
>>> Thanks
>>>> Thanks,
>>>> Zhang Haoyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseof its correspondingioapic->irr bit always set
  2014-08-25  7:29     ` Jason Wang
  2014-08-25  8:27       ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof its correspondingioapic->irr " Zhang Haoyu
@ 2014-08-26  9:28       ` Zhang Haoyu
  2014-08-27  5:09         ` Jason Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-26  9:28 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, kvm

>>>> Hi, all
>>>>
>>>> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>>>> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>>>
>>>> Any ideas?
>>> We meet this several times: search the autoneg patches for an example of
>>> workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>>> irq delivery during eoi broadcast for an workaround in kvm (rejected).
>>>
>> Thanks, Jason,
>> I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
>> http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
>
>This series is the first try to fix the guest hang during guest
>hibernation or driver enable/disable.
>> http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
>> http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
>
>Those are follow-up that tries to fix the bugs introduced by the autoneg
>hack.
>> which one tries to fix this problem, or all of them?
>
>As you can see, those kinds of hacking may not as good as we expect
>since we don't know exactly how e1000 works. Only the register function
>description from Intel's manual may not be sufficient. And you can
>search e1000 in the archives and you can find some behaviour of e1000
>registers were not fictionalized like what spec said. It was really
>suggested to use virtio-net instead of e1000 in guest. 
>>
Will the "[PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast" add delay to virtual interrupt injection sometimes,
then some time delay sensitive applications will be impacted?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseof its correspondingioapic->irr bit always set
  2014-08-26  9:28       ` Zhang Haoyu
@ 2014-08-27  5:09         ` Jason Wang
  2014-08-27  9:31           ` [Qemu-devel] [question] e1000 interrupt storm happened becauseofits " Zhang Haoyu
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2014-08-27  5:09 UTC (permalink / raw)
  To: Zhang Haoyu, qemu-devel, kvm

On 08/26/2014 05:28 PM, Zhang Haoyu wrote:
>>>>> Hi, all
>>>>>
>>>>> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>>>>> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>>>>
>>>>> Any ideas?
>>>> We meet this several times: search the autoneg patches for an example of
>>>> workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>>>> irq delivery during eoi broadcast for an workaround in kvm (rejected).
>>>>
>>> Thanks, Jason,
>>> I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
>>> http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
>> This series is the first try to fix the guest hang during guest
>> hibernation or driver enable/disable.
>>> http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
>>> http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
>> Those are follow-up that tries to fix the bugs introduced by the autoneg
>> hack.
>>> which one tries to fix this problem, or all of them?
>> As you can see, those kinds of hacking may not as good as we expect
>> since we don't know exactly how e1000 works. Only the register function
>> description from Intel's manual may not be sufficient. And you can
>> search e1000 in the archives and you can find some behaviour of e1000
>> registers were not fictionalized like what spec said. It was really
>> suggested to use virtio-net instead of e1000 in guest. 
> Will the "[PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast" add delay to virtual interrupt injection sometimes,
> then some time delay sensitive applications will be impacted?

I don't test it too much but it only give a minor delay of 1% irq in the
hope of guest irq handler will be registered shortly. But I suspect it's
the bug of e1000 who inject the irq in the wrong time. Under what cases
did you meet this issue?
>
> Thanks,
> Zhang Haoyu
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseofits correspondingioapic->irr bit always set
  2014-08-27  5:09         ` Jason Wang
@ 2014-08-27  9:31           ` Zhang Haoyu
  2014-08-28  7:12             ` Jason Wang
  2014-08-28 12:55             ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits " Zhang Haoyu
  0 siblings, 2 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-27  9:31 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, kvm

>>>>>> Hi, all
>>>>>>
>>>>>> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>>>>>> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>>>>>
>>>>>> Any ideas?
>>>>> We meet this several times: search the autoneg patches for an example of
>>>>> workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>>>>> irq delivery during eoi broadcast for an workaround in kvm (rejected).
>>>>>
>>>> Thanks, Jason,
>>>> I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
>>> This series is the first try to fix the guest hang during guest
>>> hibernation or driver enable/disable.
>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
>>> Those are follow-up that tries to fix the bugs introduced by the autoneg
>>> hack.
>>>> which one tries to fix this problem, or all of them?
>>> As you can see, those kinds of hacking may not as good as we expect
>>> since we don't know exactly how e1000 works. Only the register function
>>> description from Intel's manual may not be sufficient. And you can
>>> search e1000 in the archives and you can find some behaviour of e1000
>>> registers were not fictionalized like what spec said. It was really
>>> suggested to use virtio-net instead of e1000 in guest. 
>> Will the "[PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast" add delay to virtual interrupt injection sometimes,
>> then some time delay sensitive applications will be impacted?
>
>I don't test it too much but it only give a minor delay of 1% irq in the
>hope of guest irq handler will be registered shortly. But I suspect it's
>the bug of e1000 who inject the irq in the wrong time. Under what cases
>did you meet this issue?
Some scenarios, not constant and 100% reproducity, 
e.g., reboot vm, ifdown e1000 nic, install kaspersky(network configuration is performed during installing stage), .etc.

Thanks,
Zhang Haoyu

>>
>> Thanks,
>> Zhang Haoyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseofits correspondingioapic->irr bit always set
  2014-08-27  9:31           ` [Qemu-devel] [question] e1000 interrupt storm happened becauseofits " Zhang Haoyu
@ 2014-08-28  7:12             ` Jason Wang
  2014-08-28 12:55             ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits " Zhang Haoyu
  1 sibling, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-08-28  7:12 UTC (permalink / raw)
  To: Zhang Haoyu, qemu-devel, kvm

On 08/27/2014 05:31 PM, Zhang Haoyu wrote:
>>>>>>> Hi, all
>>>>>>> >>>>>>
>>>>>>> >>>>>> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>>>>>>> >>>>>> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>>>>>> >>>>>>
>>>>>>> >>>>>> Any ideas?
>>>>>> >>>>> We meet this several times: search the autoneg patches for an example of
>>>>>> >>>>> workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>>>>>> >>>>> irq delivery during eoi broadcast for an workaround in kvm (rejected).
>>>>>> >>>>>
>>>>> >>>> Thanks, Jason,
>>>>> >>>> I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
>>>>> >>>> http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
>>>> >>> This series is the first try to fix the guest hang during guest
>>>> >>> hibernation or driver enable/disable.
>>>>> >>>> http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
>>>>> >>>> http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
>>>> >>> Those are follow-up that tries to fix the bugs introduced by the autoneg
>>>> >>> hack.
>>>>> >>>> which one tries to fix this problem, or all of them?
>>>> >>> As you can see, those kinds of hacking may not as good as we expect
>>>> >>> since we don't know exactly how e1000 works. Only the register function
>>>> >>> description from Intel's manual may not be sufficient. And you can
>>>> >>> search e1000 in the archives and you can find some behaviour of e1000
>>>> >>> registers were not fictionalized like what spec said. It was really
>>>> >>> suggested to use virtio-net instead of e1000 in guest. 
>>> >> Will the "[PATCH] kvm: ioapic: conditionally delay irq delivery during eoi broadcast" add delay to virtual interrupt injection sometimes,
>>> >> then some time delay sensitive applications will be impacted?
>> >
>> >I don't test it too much but it only give a minor delay of 1% irq in the
>> >hope of guest irq handler will be registered shortly. But I suspect it's
>> >the bug of e1000 who inject the irq in the wrong time. Under what cases
>> >did you meet this issue?
> Some scenarios, not constant and 100% reproducity, 
> e.g., reboot vm, ifdown e1000 nic, install kaspersky(network configuration is performed during installing stage), .etc.

If you want to start the debugging, I suggest to enable e1000 debug and
then analysis the log before the interrupt storm. This may help to
locate the issue.

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr bit always set
  2014-08-27  9:31           ` [Qemu-devel] [question] e1000 interrupt storm happened becauseofits " Zhang Haoyu
  2014-08-28  7:12             ` Jason Wang
@ 2014-08-28 12:55             ` Zhang Haoyu
  2014-08-29  2:50               ` Jason Wang
                                 ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-28 12:55 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, kvm

Hi Jason,
I tested below patch, it's okay, the e1000 interrupt storm disappeared.
But I am going to make a bit change on it, could you help review it?

>Currently, we call ioapic_service() immediately when we find the irq is still
>active during eoi broadcast. But for real hardware, there's some dealy between
>the EOI writing and irq delivery (system bus latency?). So we need to emulate
>this behavior. Otherwise, for a guest who haven't register a proper irq handler
>, it would stay in the interrupt routine as this irq would be re-injected
>immediately after guest enables interrupt. This would lead guest can't move
>forward and may miss the possibility to get proper irq handler registered (one
>example is windows guest resuming from hibernation).
>
>As there's no way to differ the unhandled irq from new raised ones, this patch
>solve this problems by scheduling a delayed work when the count of irq injected
>during eoi broadcast exceeds a threshold value. After this patch, the guest can
>move a little forward when there's no suitable irq handler in case it may
>register one very soon and for guest who has a bad irq detection routine ( such
>as note_interrupt() in linux ), this bad irq would be recognized soon as in the
>past.
>
>Signed-off-by: Jason Wang <jasowang <at> redhat.com>
>---
> virt/kvm/ioapic.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
> virt/kvm/ioapic.h |    2 ++
> 2 files changed, 47 insertions(+), 2 deletions(-)
>
>diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>index dcaf272..892253e 100644
>--- a/virt/kvm/ioapic.c
>+++ b/virt/kvm/ioapic.c
> <at>  <at>  -221,6 +221,24  <at>  <at>  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
> 	return ret;
> }
>
>+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>+{
>+	int i, ret;
>+	struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>+						 eoi_inject.work);
>+	spin_lock(&ioapic->lock);
>+	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>+		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>+
>+		if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>+			continue;
>+
>+		if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>+			ret = ioapic_service(ioapic, i);
>+	}
>+	spin_unlock(&ioapic->lock);
>+}
>+
> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
> 				     int trigger_mode)
> {
> <at>  <at>  -249,8 +267,29  <at>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>
> 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> 		ent->fields.remote_irr = 0;
>-		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>-			ioapic_service(ioapic, i);
>+		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>+			++ioapic->irq_eoi;
-+			++ioapic->irq_eoi;
++		    ++ioapic->irq_eoi[i];
>+			if (ioapic->irq_eoi == 100) {
-+			if (ioapic->irq_eoi == 100) {
++			if (ioapic->irq_eoi[i] == 100) {
>+				/*
>+				 * Real hardware does not deliver the irq so
>+				 * immediately during eoi broadcast, so we need
>+				 * to emulate this behavior. Otherwise, for
>+				 * guests who has not registered handler of a
>+				 * level irq, this irq would be injected
>+				 * immediately after guest enables interrupt
>+				 * (which happens usually at the end of the
>+				 * common interrupt routine). This would lead
>+				 * guest can't move forward and may miss the
>+				 * possibility to get proper irq handler
>+				 * registered. So we need to give some breath to
>+				 * guest. TODO: 1 is too long?
>+				 */
>+				schedule_delayed_work(&ioapic->eoi_inject, 1);
>+				ioapic->irq_eoi = 0;
-+				ioapic->irq_eoi = 0;
++				ioapic->irq_eoi[i] = 0;
>+			} else {
>+				ioapic_service(ioapic, i);
>+			}
>+		}
++		else {
++			ioapic->irq_eoi[i] = 0;
++		}
> 	}
> }
I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, 
it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
and not too long, ioapic->irq_eoi will reach to 100.
I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
Any ideas?

Zhang Haoyu
>
> <at>  <at>  -375,12 +414,14  <at>  <at>  void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
> {
> 	int i;
>
>+	cancel_delayed_work_sync(&ioapic->eoi_inject);
> 	for (i = 0; i < IOAPIC_NUM_PINS; i++)
> 		ioapic->redirtbl[i].fields.mask = 1;
> 	ioapic->base_address = IOAPIC_DEFAULT_BASE_ADDRESS;
> 	ioapic->ioregsel = 0;
> 	ioapic->irr = 0;
> 	ioapic->id = 0;
>+	ioapic->irq_eoi = 0;
-+	ioapic->irq_eoi = 0;
++	memset(ioapic->irq_eoi, 0x00, IOAPIC_NUM_PINS);
> 	update_handled_vectors(ioapic);
> }
>
> <at>  <at>  -398,6 +439,7  <at>  <at>  int kvm_ioapic_init(struct kvm *kvm)
> 	if (!ioapic)
> 		return -ENOMEM;
> 	spin_lock_init(&ioapic->lock);
>+	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
> 	kvm->arch.vioapic = ioapic;
> 	kvm_ioapic_reset(ioapic);
> 	kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops);
> <at>  <at>  -418,6 +460,7  <at>  <at>  void kvm_ioapic_destroy(struct kvm *kvm)
> {
> 	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
>
>+	cancel_delayed_work_sync(&ioapic->eoi_inject);
> 	if (ioapic) {
> 		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev);
> 		kvm->arch.vioapic = NULL;
>diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
>index 0b190c3..8938e66 100644
>--- a/virt/kvm/ioapic.h
>+++ b/virt/kvm/ioapic.h
> <at>  <at>  -47,6 +47,8  <at>  <at>  struct kvm_ioapic {
> 	void (*ack_notifier)(void *opaque, int irq);
> 	spinlock_t lock;
> 	DECLARE_BITMAP(handled_vectors, 256);
>+	struct delayed_work eoi_inject;
>+	u32 irq_eoi;
-+	u32 irq_eoi;
++	u32 irq_eoi[IOAPIC_NUM_PINS];
> };
>
> #ifdef DEBUG

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr bit always set
  2014-08-28 12:55             ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits " Zhang Haoyu
@ 2014-08-29  2:50               ` Jason Wang
  2014-08-29  3:14               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr " Zhang Haoyu
  2014-09-02 15:44               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr " Michael S. Tsirkin
  2 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-08-29  2:50 UTC (permalink / raw)
  To: Zhang Haoyu, qemu-devel, kvm

On 08/28/2014 08:55 PM, Zhang Haoyu wrote:
> Hi Jason,
> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
> But I am going to make a bit change on it, could you help review it?
>
>> Currently, we call ioapic_service() immediately when we find the irq is still
>> active during eoi broadcast. But for real hardware, there's some dealy between
>> the EOI writing and irq delivery (system bus latency?). So we need to emulate
>> this behavior. Otherwise, for a guest who haven't register a proper irq handler
>> , it would stay in the interrupt routine as this irq would be re-injected
>> immediately after guest enables interrupt. This would lead guest can't move
>> forward and may miss the possibility to get proper irq handler registered (one
>> example is windows guest resuming from hibernation).
>>
>> As there's no way to differ the unhandled irq from new raised ones, this patch
>> solve this problems by scheduling a delayed work when the count of irq injected
>> during eoi broadcast exceeds a threshold value. After this patch, the guest can
>> move a little forward when there's no suitable irq handler in case it may
>> register one very soon and for guest who has a bad irq detection routine ( such
>> as note_interrupt() in linux ), this bad irq would be recognized soon as in the
>> past.
>>
>> Signed-off-by: Jason Wang <jasowang <at> redhat.com>
>> ---
>> virt/kvm/ioapic.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
>> virt/kvm/ioapic.h |    2 ++
>> 2 files changed, 47 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>> index dcaf272..892253e 100644
>> --- a/virt/kvm/ioapic.c
>> +++ b/virt/kvm/ioapic.c
>> <at>  <at>  -221,6 +221,24  <at>  <at>  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
>> 	return ret;
>> }
>>
>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>> +{
>> +	int i, ret;
>> +	struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>> +						 eoi_inject.work);
>> +	spin_lock(&ioapic->lock);
>> +	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>> +		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>> +
>> +		if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>> +			continue;
>> +
>> +		if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>> +			ret = ioapic_service(ioapic, i);
>> +	}
>> +	spin_unlock(&ioapic->lock);
>> +}
>> +
>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>> 				     int trigger_mode)
>> {
>> <at>  <at>  -249,8 +267,29  <at>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>
>> 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>> 		ent->fields.remote_irr = 0;
>> -		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>> -			ioapic_service(ioapic, i);
>> +		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>> +			++ioapic->irq_eoi;
> -+			++ioapic->irq_eoi;
> ++		    ++ioapic->irq_eoi[i];
>> +			if (ioapic->irq_eoi == 100) {
> -+			if (ioapic->irq_eoi == 100) {
> ++			if (ioapic->irq_eoi[i] == 100) {
>> +				/*
>> +				 * Real hardware does not deliver the irq so
>> +				 * immediately during eoi broadcast, so we need
>> +				 * to emulate this behavior. Otherwise, for
>> +				 * guests who has not registered handler of a
>> +				 * level irq, this irq would be injected
>> +				 * immediately after guest enables interrupt
>> +				 * (which happens usually at the end of the
>> +				 * common interrupt routine). This would lead
>> +				 * guest can't move forward and may miss the
>> +				 * possibility to get proper irq handler
>> +				 * registered. So we need to give some breath to
>> +				 * guest. TODO: 1 is too long?
>> +				 */
>> +				schedule_delayed_work(&ioapic->eoi_inject, 1);
>> +				ioapic->irq_eoi = 0;
> -+				ioapic->irq_eoi = 0;
> ++				ioapic->irq_eoi[i] = 0;
>> +			} else {
>> +				ioapic_service(ioapic, i);
>> +			}
>> +		}
> ++		else {
> ++			ioapic->irq_eoi[i] = 0;
> ++		}
>> 	}
>> }
> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, 
> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
> and not too long, ioapic->irq_eoi will reach to 100.
> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
> Any ideas?
>
> Zhang Haoyu

I don't have objection to this per irq counter, but I don't see obvious
difference.

Btw, the patch has been rejected in the past since we're not sure
whether this behaviour is architectural (or ask Intel?). You need
convince the Gleb and other kvm maintainers for this idea first :).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set
  2014-08-28 12:55             ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits " Zhang Haoyu
  2014-08-29  2:50               ` Jason Wang
@ 2014-08-29  3:14               ` Zhang Haoyu
  2014-08-29  4:07                 ` Zhang, Yang Z
  2014-09-02 15:44               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr " Michael S. Tsirkin
  2 siblings, 1 reply; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-29  3:14 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, kvm, Gleb Natapov, Michael S.Tsirkin,
	Zhang, Yang Z

Hi, Yang, Gleb, Michael,
Could you help review below patch please?

Thanks,
Zhang Haoyu

>> Hi Jason,
>> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
>> But I am going to make a bit change on it, could you help review it?
>>
>>> Currently, we call ioapic_service() immediately when we find the irq is still
>>> active during eoi broadcast. But for real hardware, there's some dealy between
>>> the EOI writing and irq delivery (system bus latency?). So we need to emulate
>>> this behavior. Otherwise, for a guest who haven't register a proper irq handler
>>> , it would stay in the interrupt routine as this irq would be re-injected
>>> immediately after guest enables interrupt. This would lead guest can't move
>>> forward and may miss the possibility to get proper irq handler registered (one
>>> example is windows guest resuming from hibernation).
>>>
>>> As there's no way to differ the unhandled irq from new raised ones, this patch
>>> solve this problems by scheduling a delayed work when the count of irq injected
>>> during eoi broadcast exceeds a threshold value. After this patch, the guest can
>>> move a little forward when there's no suitable irq handler in case it may
>>> register one very soon and for guest who has a bad irq detection routine ( such
>>> as note_interrupt() in linux ), this bad irq would be recognized soon as in the
>>> past.
>>>
>>> Signed-off-by: Jason Wang <jasowang <at> redhat.com>
>>> ---
>>> virt/kvm/ioapic.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
>>> virt/kvm/ioapic.h |    2 ++
>>> 2 files changed, 47 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>> index dcaf272..892253e 100644
>>> --- a/virt/kvm/ioapic.c
>>> +++ b/virt/kvm/ioapic.c
>>> <at>  <at>  -221,6 +221,24  <at>  <at>  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
>>> 	return ret;
>>> }
>>>
>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>>> +{
>>> +	int i, ret;
>>> +	struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>>> +						 eoi_inject.work);
>>> +	spin_lock(&ioapic->lock);
>>> +	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>>> +		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>>> +
>>> +		if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>>> +			continue;
>>> +
>>> +		if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>>> +			ret = ioapic_service(ioapic, i);
>>> +	}
>>> +	spin_unlock(&ioapic->lock);
>>> +}
>>> +
>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>> 				     int trigger_mode)
>>> {
>>> <at>  <at>  -249,8 +267,29  <at>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>>
>>> 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>> 		ent->fields.remote_irr = 0;
>>> -		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>>> -			ioapic_service(ioapic, i);
>>> +		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>>> +			++ioapic->irq_eoi;
>> -+			++ioapic->irq_eoi;
>> ++		    ++ioapic->irq_eoi[i];
>>> +			if (ioapic->irq_eoi == 100) {
>> -+			if (ioapic->irq_eoi == 100) {
>> ++			if (ioapic->irq_eoi[i] == 100) {
>>> +				/*
>>> +				 * Real hardware does not deliver the irq so
>>> +				 * immediately during eoi broadcast, so we need
>>> +				 * to emulate this behavior. Otherwise, for
>>> +				 * guests who has not registered handler of a
>>> +				 * level irq, this irq would be injected
>>> +				 * immediately after guest enables interrupt
>>> +				 * (which happens usually at the end of the
>>> +				 * common interrupt routine). This would lead
>>> +				 * guest can't move forward and may miss the
>>> +				 * possibility to get proper irq handler
>>> +				 * registered. So we need to give some breath to
>>> +				 * guest. TODO: 1 is too long?
>>> +				 */
>>> +				schedule_delayed_work(&ioapic->eoi_inject, 1);
>>> +				ioapic->irq_eoi = 0;
>> -+				ioapic->irq_eoi = 0;
>> ++				ioapic->irq_eoi[i] = 0;
>>> +			} else {
>>> +				ioapic_service(ioapic, i);
>>> +			}
>>> +		}
>> ++		else {
>> ++			ioapic->irq_eoi[i] = 0;
>> ++		}
>>> 	}
>>> }
>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, 
>> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
>> and not too long, ioapic->irq_eoi will reach to 100.
>> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
>> Any ideas?
>>
>> Zhang Haoyu
>
>I don't have objection to this per irq counter, but I don't see obvious
>difference.
>
>Btw, the patch has been rejected in the past since we're not sure
>whether this behaviour is architectural (or ask Intel?). You need
>convince the Gleb and other kvm maintainers for this idea first :).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set
  2014-08-29  3:14               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr " Zhang Haoyu
@ 2014-08-29  4:07                 ` Zhang, Yang Z
  2014-08-29  4:28                   ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Yang Z @ 2014-08-29  4:07 UTC (permalink / raw)
  To: Zhang Haoyu, Jason Wang, qemu-devel, kvm, Gleb Natapov,
	Michael S.Tsirkin

Zhang Haoyu wrote on 2014-08-29:
> Hi, Yang, Gleb, Michael,
> Could you help review below patch please?

I don't quite understand the background. Why ioacpi->irr is setting before EOI? It should be driver's responsibility to clear the interrupt before issuing EOI.

> 
> Thanks,
> Zhang Haoyu
> 
>>> Hi Jason,
>>> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
>>> But I am going to make a bit change on it, could you help review it?
>>> 
>>>> Currently, we call ioapic_service() immediately when we find the irq
>>>> is still active during eoi broadcast. But for real hardware, there's
>>>> some dealy between the EOI writing and irq delivery (system bus
>>>> latency?). So we need to emulate this behavior. Otherwise, for a
>>>> guest who haven't register a proper irq handler , it would stay in
>>>> the interrupt routine as this irq would be re-injected immediately
>>>> after guest enables interrupt. This would lead guest can't move
>>>> forward and may miss the possibility to get proper irq handler
>>>> registered (one example is windows guest resuming from hibernation).
>>>> 
>>>> As there's no way to differ the unhandled irq from new raised ones,
>>>> this patch solve this problems by scheduling a delayed work when the
>>>> count of irq injected during eoi broadcast exceeds a threshold value.
>>>> After this patch, the guest can move a little forward when there's no
>>>> suitable irq handler in case it may register one very soon and for
>>>> guest who has a bad irq detection routine ( such as note_interrupt()
>>>> in linux ), this bad irq would be recognized soon as in the past.
>>>> 
>>>> Signed-off-by: Jason Wang <jasowang <at> redhat.com> ---
>>>> virt/kvm/ioapic.c |   47
>>>> +++++++++++++++++++++++++++++++++++++++++++++-- virt/kvm/ioapic.h |  
>>>>  2 ++ 2 files changed, 47 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index
>>>> dcaf272..892253e 100644 --- a/virt/kvm/ioapic.c +++
>>>> b/virt/kvm/ioapic.c <at>  <at>  -221,6 +221,24  <at>  <at>  int
>>>> kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
>>>> 	return ret; }
>>>> 
>>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work) +{
>>>> +	int i, ret; +	struct kvm_ioapic *ioapic = container_of(work, struct
>>>> kvm_ioapic, +						 eoi_inject.work); +	spin_lock(&ioapic->lock);
>>>> +	for (i = 0; i < IOAPIC_NUM_PINS; i++) { +		union
>>>> kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; + +		if
>>>> (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) +			continue; + +		if
>>>> (ioapic->irr & (1 << i) && !ent->fields.remote_irr) +			ret =
>>>> ioapic_service(ioapic, i); +	} +	spin_unlock(&ioapic->lock); +} +
>>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int
>>>> vector, 				     int trigger_mode) { <at>  <at>  -249,8 +267,29  <at>
>>>>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic,
>>>> int vector,
>>>> 
>>>> 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>>> 		ent->fields.remote_irr = 0;
>>>> -		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>>>> -			ioapic_service(ioapic, i);
>>>> +		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>>>> +			++ioapic->irq_eoi;
>>> -+			++ioapic->irq_eoi;
>>> ++		    ++ioapic->irq_eoi[i];
>>>> +			if (ioapic->irq_eoi == 100) {
>>> -+			if (ioapic->irq_eoi == 100) {
>>> ++			if (ioapic->irq_eoi[i] == 100) {
>>>> +				/*
>>>> +				 * Real hardware does not deliver the irq so
>>>> +				 * immediately during eoi broadcast, so we need
>>>> +				 * to emulate this behavior. Otherwise, for
>>>> +				 * guests who has not registered handler of a
>>>> +				 * level irq, this irq would be injected
>>>> +				 * immediately after guest enables interrupt
>>>> +				 * (which happens usually at the end of the
>>>> +				 * common interrupt routine). This would lead
>>>> +				 * guest can't move forward and may miss the
>>>> +				 * possibility to get proper irq handler
>>>> +				 * registered. So we need to give some breath to
>>>> +				 * guest. TODO: 1 is too long?
>>>> +				 */
>>>> +				schedule_delayed_work(&ioapic->eoi_inject, 1);
>>>> +				ioapic->irq_eoi = 0;
>>> -+				ioapic->irq_eoi = 0;
>>> ++				ioapic->irq_eoi[i] = 0;
>>>> +			} else {
>>>> +				ioapic_service(ioapic, i);
>>>> +			}
>>>> +		}
>>> ++		else {
>>> ++			ioapic->irq_eoi[i] = 0;
>>> ++		}
>>>> 	}
>>>> }
>>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi
>>> broadcast, it's possible that another interrupt's (not current eoi's
>>> corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow
>>> continually, and not too long, ioapic->irq_eoi will reach to 100. I
>>> want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
>>> Any ideas?
>>> 
>>> Zhang Haoyu
>> 
>> I don't have objection to this per irq counter, but I don't see obvious
>> difference.
>> 
>> Btw, the patch has been rejected in the past since we're not sure
>> whether this behaviour is architectural (or ask Intel?). You need
>> convince the Gleb and other kvm maintainers for this idea first :).


Best regards,
Yang



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set
  2014-08-29  4:07                 ` Zhang, Yang Z
@ 2014-08-29  4:28                   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-08-29  4:28 UTC (permalink / raw)
  To: Zhang, Yang Z, Zhang Haoyu, qemu-devel, kvm, Gleb Natapov,
	Michael S.Tsirkin

On 08/29/2014 12:07 PM, Zhang, Yang Z wrote:
> Zhang Haoyu wrote on 2014-08-29:
>> > Hi, Yang, Gleb, Michael,
>> > Could you help review below patch please?
> I don't quite understand the background. Why ioacpi->irr is setting before EOI? It should be driver's responsibility to clear the interrupt before issuing EOI.
>

This may happen when a interrupt was injected to guest when its irq
handler (driver) was not registered. So irr was still set even during
EOI broadcast, and then this irq will be injected to guest immediately.
This may cause a dead loop for guest who does not have the ability to
detect and disable interrupt storm.

This may be a bug of device model, but we want to know in real cpu, is
there a small time gap between the finish of EOI broadcast and the
interrupt raised by EOI? If yes, looks like KVM should emulate this
behaviour?

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr bit always set
  2014-08-28 12:55             ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits " Zhang Haoyu
  2014-08-29  2:50               ` Jason Wang
  2014-08-29  3:14               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr " Zhang Haoyu
@ 2014-09-02 15:44               ` Michael S. Tsirkin
  2014-09-04  1:56                 ` [Qemu-devel] [question] e1000 interrupt stormhappenedbecauseofits " Zhang Haoyu
  2 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-09-02 15:44 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: Jason Wang, qemu-devel, kvm

On Thu, Aug 28, 2014 at 08:55:18PM +0800, Zhang Haoyu wrote:
> Hi Jason,
> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
> But I am going to make a bit change on it, could you help review it?
> 
> >Currently, we call ioapic_service() immediately when we find the irq is still
> >active during eoi broadcast. But for real hardware, there's some dealy between
> >the EOI writing and irq delivery (system bus latency?). So we need to emulate
> >this behavior. Otherwise, for a guest who haven't register a proper irq handler
> >, it would stay in the interrupt routine as this irq would be re-injected
> >immediately after guest enables interrupt. This would lead guest can't move
> >forward and may miss the possibility to get proper irq handler registered (one
> >example is windows guest resuming from hibernation).
> >
> >As there's no way to differ the unhandled irq from new raised ones, this patch
> >solve this problems by scheduling a delayed work when the count of irq injected
> >during eoi broadcast exceeds a threshold value. After this patch, the guest can
> >move a little forward when there's no suitable irq handler in case it may
> >register one very soon and for guest who has a bad irq detection routine ( such
> >as note_interrupt() in linux ), this bad irq would be recognized soon as in the
> >past.
> >
> >Signed-off-by: Jason Wang <jasowang <at> redhat.com>
> >---
> > virt/kvm/ioapic.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
> > virt/kvm/ioapic.h |    2 ++
> > 2 files changed, 47 insertions(+), 2 deletions(-)
> >
> >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> >index dcaf272..892253e 100644
> >--- a/virt/kvm/ioapic.c
> >+++ b/virt/kvm/ioapic.c
> > <at>  <at>  -221,6 +221,24  <at>  <at>  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
> > 	return ret;
> > }
> >
> >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
> >+{
> >+	int i, ret;
> >+	struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
> >+						 eoi_inject.work);
> >+	spin_lock(&ioapic->lock);
> >+	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> >+		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
> >+
> >+		if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
> >+			continue;
> >+
> >+		if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
> >+			ret = ioapic_service(ioapic, i);
> >+	}
> >+	spin_unlock(&ioapic->lock);
> >+}
> >+
> > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
> > 				     int trigger_mode)
> > {
> > <at>  <at>  -249,8 +267,29  <at>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
> >
> > 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> > 		ent->fields.remote_irr = 0;
> >-		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
> >-			ioapic_service(ioapic, i);
> >+		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
> >+			++ioapic->irq_eoi;
> -+			++ioapic->irq_eoi;
> ++		    ++ioapic->irq_eoi[i];
> >+			if (ioapic->irq_eoi == 100) {
> -+			if (ioapic->irq_eoi == 100) {
> ++			if (ioapic->irq_eoi[i] == 100) {
> >+				/*
> >+				 * Real hardware does not deliver the irq so
> >+				 * immediately during eoi broadcast, so we need
> >+				 * to emulate this behavior. Otherwise, for
> >+				 * guests who has not registered handler of a
> >+				 * level irq, this irq would be injected
> >+				 * immediately after guest enables interrupt
> >+				 * (which happens usually at the end of the
> >+				 * common interrupt routine). This would lead
> >+				 * guest can't move forward and may miss the
> >+				 * possibility to get proper irq handler
> >+				 * registered. So we need to give some breath to
> >+				 * guest. TODO: 1 is too long?
> >+				 */
> >+				schedule_delayed_work(&ioapic->eoi_inject, 1);
> >+				ioapic->irq_eoi = 0;
> -+				ioapic->irq_eoi = 0;
> ++				ioapic->irq_eoi[i] = 0;
> >+			} else {
> >+				ioapic_service(ioapic, i);
> >+			}
> >+		}
> ++		else {
> ++			ioapic->irq_eoi[i] = 0;
> ++		}
> > 	}
> > }
> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, 
> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
> and not too long, ioapic->irq_eoi will reach to 100.
> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
> Any ideas?
> 
> Zhang Haoyu

I'm a bit concerned how this will affect realtime guests.
Worth adding a flag to enable this, so that e.g. virtio is not
affected?


> >
> > <at>  <at>  -375,12 +414,14  <at>  <at>  void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
> > {
> > 	int i;
> >
> >+	cancel_delayed_work_sync(&ioapic->eoi_inject);
> > 	for (i = 0; i < IOAPIC_NUM_PINS; i++)
> > 		ioapic->redirtbl[i].fields.mask = 1;
> > 	ioapic->base_address = IOAPIC_DEFAULT_BASE_ADDRESS;
> > 	ioapic->ioregsel = 0;
> > 	ioapic->irr = 0;
> > 	ioapic->id = 0;
> >+	ioapic->irq_eoi = 0;
> -+	ioapic->irq_eoi = 0;
> ++	memset(ioapic->irq_eoi, 0x00, IOAPIC_NUM_PINS);
> > 	update_handled_vectors(ioapic);
> > }
> >
> > <at>  <at>  -398,6 +439,7  <at>  <at>  int kvm_ioapic_init(struct kvm *kvm)
> > 	if (!ioapic)
> > 		return -ENOMEM;
> > 	spin_lock_init(&ioapic->lock);
> >+	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
> > 	kvm->arch.vioapic = ioapic;
> > 	kvm_ioapic_reset(ioapic);
> > 	kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops);
> > <at>  <at>  -418,6 +460,7  <at>  <at>  void kvm_ioapic_destroy(struct kvm *kvm)
> > {
> > 	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
> >
> >+	cancel_delayed_work_sync(&ioapic->eoi_inject);
> > 	if (ioapic) {
> > 		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev);
> > 		kvm->arch.vioapic = NULL;
> >diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
> >index 0b190c3..8938e66 100644
> >--- a/virt/kvm/ioapic.h
> >+++ b/virt/kvm/ioapic.h
> > <at>  <at>  -47,6 +47,8  <at>  <at>  struct kvm_ioapic {
> > 	void (*ack_notifier)(void *opaque, int irq);
> > 	spinlock_t lock;
> > 	DECLARE_BITMAP(handled_vectors, 256);
> >+	struct delayed_work eoi_inject;
> >+	u32 irq_eoi;
> -+	u32 irq_eoi;
> ++	u32 irq_eoi[IOAPIC_NUM_PINS];
> > };
> >
> > #ifdef DEBUG
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt stormhappenedbecauseofits correspondingioapic->irr bit always set
  2014-09-02 15:44               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr " Michael S. Tsirkin
@ 2014-09-04  1:56                 ` Zhang Haoyu
  2014-09-04  4:57                   ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang Haoyu @ 2014-09-04  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, kvm

>> Hi Jason,
>> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
>> But I am going to make a bit change on it, could you help review it?
>> 
>> >Currently, we call ioapic_service() immediately when we find the irq is still
>> >active during eoi broadcast. But for real hardware, there's some dealy between
>> >the EOI writing and irq delivery (system bus latency?). So we need to emulate
>> >this behavior. Otherwise, for a guest who haven't register a proper irq handler
>> >, it would stay in the interrupt routine as this irq would be re-injected
>> >immediately after guest enables interrupt. This would lead guest can't move
>> >forward and may miss the possibility to get proper irq handler registered (one
>> >example is windows guest resuming from hibernation).
>> >
>> >As there's no way to differ the unhandled irq from new raised ones, this patch
>> >solve this problems by scheduling a delayed work when the count of irq injected
>> >during eoi broadcast exceeds a threshold value. After this patch, the guest can
>> >move a little forward when there's no suitable irq handler in case it may
>> >register one very soon and for guest who has a bad irq detection routine ( such
>> >as note_interrupt() in linux ), this bad irq would be recognized soon as in the
>> >past.
>> >
>> >Signed-off-by: Jason Wang <jasowang <at> redhat.com>
>> >---
>> > virt/kvm/ioapic.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
>> > virt/kvm/ioapic.h |    2 ++
>> > 2 files changed, 47 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>> >index dcaf272..892253e 100644
>> >--- a/virt/kvm/ioapic.c
>> >+++ b/virt/kvm/ioapic.c
>> > <at>  <at>  -221,6 +221,24  <at>  <at>  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
>> > 	return ret;
>> > }
>> >
>> >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>> >+{
>> >+	int i, ret;
>> >+	struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>> >+						 eoi_inject.work);
>> >+	spin_lock(&ioapic->lock);
>> >+	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>> >+		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>> >+
>> >+		if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>> >+			continue;
>> >+
>> >+		if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>> >+			ret = ioapic_service(ioapic, i);
>> >+	}
>> >+	spin_unlock(&ioapic->lock);
>> >+}
>> >+
>> > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>> > 				     int trigger_mode)
>> > {
>> > <at>  <at>  -249,8 +267,29  <at>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>> >
>> > 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>> > 		ent->fields.remote_irr = 0;
>> >-		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>> >-			ioapic_service(ioapic, i);
>> >+		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>> >+			++ioapic->irq_eoi;
>> -+			++ioapic->irq_eoi;
>> ++		    ++ioapic->irq_eoi[i];
>> >+			if (ioapic->irq_eoi == 100) {
>> -+			if (ioapic->irq_eoi == 100) {
>> ++			if (ioapic->irq_eoi[i] == 100) {
>> >+				/*
>> >+				 * Real hardware does not deliver the irq so
>> >+				 * immediately during eoi broadcast, so we need
>> >+				 * to emulate this behavior. Otherwise, for
>> >+				 * guests who has not registered handler of a
>> >+				 * level irq, this irq would be injected
>> >+				 * immediately after guest enables interrupt
>> >+				 * (which happens usually at the end of the
>> >+				 * common interrupt routine). This would lead
>> >+				 * guest can't move forward and may miss the
>> >+				 * possibility to get proper irq handler
>> >+				 * registered. So we need to give some breath to
>> >+				 * guest. TODO: 1 is too long?
>> >+				 */
>> >+				schedule_delayed_work(&ioapic->eoi_inject, 1);
>> >+				ioapic->irq_eoi = 0;
>> -+				ioapic->irq_eoi = 0;
>> ++				ioapic->irq_eoi[i] = 0;
>> >+			} else {
>> >+				ioapic_service(ioapic, i);
>> >+			}
>> >+		}
>> ++		else {
>> ++			ioapic->irq_eoi[i] = 0;
>> ++		}
>> > 	}
>> > }
>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, 
>> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
>> and not too long, ioapic->irq_eoi will reach to 100.
>> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
>> Any ideas?
>> 
>> Zhang Haoyu
>
>I'm a bit concerned how this will affect realtime guests.
>Worth adding a flag to enable this, so that e.g. virtio is not
>affected?
>
Your concern is reasonable.
If applying Jason's original patch, sometimes the virtio's interrupt delay is more than 4ms(my host's HZ=250), 
but very rarely happened.
And with my above change on it(per irq counter instead of total irq counter), the delayed virtio interrupt is more rarely happened,
then I use 1000 instead of 100 on "if (ioapic->irq_eoi[i] == 1000)",  I made a test for 10min, the delayed virtio interrupt has not happened.

Thanks,
Zhang Haoyu

>
>> >
>> > <at>  <at>  -375,12 +414,14  <at>  <at>  void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
>> > {
>> > 	int i;
>> >
>> >+	cancel_delayed_work_sync(&ioapic->eoi_inject);
>> > 	for (i = 0; i < IOAPIC_NUM_PINS; i++)
>> > 		ioapic->redirtbl[i].fields.mask = 1;
>> > 	ioapic->base_address = IOAPIC_DEFAULT_BASE_ADDRESS;
>> > 	ioapic->ioregsel = 0;
>> > 	ioapic->irr = 0;
>> > 	ioapic->id = 0;
>> >+	ioapic->irq_eoi = 0;
>> -+	ioapic->irq_eoi = 0;
>> ++	memset(ioapic->irq_eoi, 0x00, IOAPIC_NUM_PINS);
>> > 	update_handled_vectors(ioapic);
>> > }
>> >
>> > <at>  <at>  -398,6 +439,7  <at>  <at>  int kvm_ioapic_init(struct kvm *kvm)
>> > 	if (!ioapic)
>> > 		return -ENOMEM;
>> > 	spin_lock_init(&ioapic->lock);
>> >+	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
>> > 	kvm->arch.vioapic = ioapic;
>> > 	kvm_ioapic_reset(ioapic);
>> > 	kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops);
>> > <at>  <at>  -418,6 +460,7  <at>  <at>  void kvm_ioapic_destroy(struct kvm *kvm)
>> > {
>> > 	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
>> >
>> >+	cancel_delayed_work_sync(&ioapic->eoi_inject);
>> > 	if (ioapic) {
>> > 		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev);
>> > 		kvm->arch.vioapic = NULL;
>> >diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
>> >index 0b190c3..8938e66 100644
>> >--- a/virt/kvm/ioapic.h
>> >+++ b/virt/kvm/ioapic.h
>> > <at>  <at>  -47,6 +47,8  <at>  <at>  struct kvm_ioapic {
>> > 	void (*ack_notifier)(void *opaque, int irq);
>> > 	spinlock_t lock;
>> > 	DECLARE_BITMAP(handled_vectors, 256);
>> >+	struct delayed_work eoi_inject;
>> >+	u32 irq_eoi;
>> -+	u32 irq_eoi;
>> ++	u32 irq_eoi[IOAPIC_NUM_PINS];
>> > };
>> >
>> > #ifdef DEBUG
>> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt stormhappenedbecauseofits correspondingioapic->irr bit always set
  2014-09-04  1:56                 ` [Qemu-devel] [question] e1000 interrupt stormhappenedbecauseofits " Zhang Haoyu
@ 2014-09-04  4:57                   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-09-04  4:57 UTC (permalink / raw)
  To: Zhang Haoyu, Michael S. Tsirkin; +Cc: qemu-devel, kvm

On 09/04/2014 09:56 AM, Zhang Haoyu wrote:
>>> Hi Jason,
>>> >> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
>>> >> But I am going to make a bit change on it, could you help review it?
>>> >> 
>>>> >> >Currently, we call ioapic_service() immediately when we find the irq is still
>>>> >> >active during eoi broadcast. But for real hardware, there's some dealy between
>>>> >> >the EOI writing and irq delivery (system bus latency?). So we need to emulate
>>>> >> >this behavior. Otherwise, for a guest who haven't register a proper irq handler
>>>> >> >, it would stay in the interrupt routine as this irq would be re-injected
>>>> >> >immediately after guest enables interrupt. This would lead guest can't move
>>>> >> >forward and may miss the possibility to get proper irq handler registered (one
>>>> >> >example is windows guest resuming from hibernation).
>>>> >> >
>>>> >> >As there's no way to differ the unhandled irq from new raised ones, this patch
>>>> >> >solve this problems by scheduling a delayed work when the count of irq injected
>>>> >> >during eoi broadcast exceeds a threshold value. After this patch, the guest can
>>>> >> >move a little forward when there's no suitable irq handler in case it may
>>>> >> >register one very soon and for guest who has a bad irq detection routine ( such
>>>> >> >as note_interrupt() in linux ), this bad irq would be recognized soon as in the
>>>> >> >past.
>>>> >> >
>>>> >> >Signed-off-by: Jason Wang <jasowang <at> redhat.com>
>>>> >> >---
>>>> >> > virt/kvm/ioapic.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
>>>> >> > virt/kvm/ioapic.h |    2 ++
>>>> >> > 2 files changed, 47 insertions(+), 2 deletions(-)
>>>> >> >
>>>> >> >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>>> >> >index dcaf272..892253e 100644
>>>> >> >--- a/virt/kvm/ioapic.c
>>>> >> >+++ b/virt/kvm/ioapic.c
>>>> >> > <at>  <at>  -221,6 +221,24  <at>  <at>  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
>>>> >> > 	return ret;
>>>> >> > }
>>>> >> >
>>>> >> >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>>>> >> >+{
>>>> >> >+	int i, ret;
>>>> >> >+	struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>>>> >> >+						 eoi_inject.work);
>>>> >> >+	spin_lock(&ioapic->lock);
>>>> >> >+	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>>>> >> >+		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>>>> >> >+
>>>> >> >+		if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>>>> >> >+			continue;
>>>> >> >+
>>>> >> >+		if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>>>> >> >+			ret = ioapic_service(ioapic, i);
>>>> >> >+	}
>>>> >> >+	spin_unlock(&ioapic->lock);
>>>> >> >+}
>>>> >> >+
>>>> >> > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>>> >> > 				     int trigger_mode)
>>>> >> > {
>>>> >> > <at>  <at>  -249,8 +267,29  <at>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>>> >> >
>>>> >> > 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>>> >> > 		ent->fields.remote_irr = 0;
>>>> >> >-		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>>>> >> >-			ioapic_service(ioapic, i);
>>>> >> >+		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>>>> >> >+			++ioapic->irq_eoi;
>>> >> -+			++ioapic->irq_eoi;
>>> >> ++		    ++ioapic->irq_eoi[i];
>>>> >> >+			if (ioapic->irq_eoi == 100) {
>>> >> -+			if (ioapic->irq_eoi == 100) {
>>> >> ++			if (ioapic->irq_eoi[i] == 100) {
>>>> >> >+				/*
>>>> >> >+				 * Real hardware does not deliver the irq so
>>>> >> >+				 * immediately during eoi broadcast, so we need
>>>> >> >+				 * to emulate this behavior. Otherwise, for
>>>> >> >+				 * guests who has not registered handler of a
>>>> >> >+				 * level irq, this irq would be injected
>>>> >> >+				 * immediately after guest enables interrupt
>>>> >> >+				 * (which happens usually at the end of the
>>>> >> >+				 * common interrupt routine). This would lead
>>>> >> >+				 * guest can't move forward and may miss the
>>>> >> >+				 * possibility to get proper irq handler
>>>> >> >+				 * registered. So we need to give some breath to
>>>> >> >+				 * guest. TODO: 1 is too long?
>>>> >> >+				 */
>>>> >> >+				schedule_delayed_work(&ioapic->eoi_inject, 1);
>>>> >> >+				ioapic->irq_eoi = 0;
>>> >> -+				ioapic->irq_eoi = 0;
>>> >> ++				ioapic->irq_eoi[i] = 0;
>>>> >> >+			} else {
>>>> >> >+				ioapic_service(ioapic, i);
>>>> >> >+			}
>>>> >> >+		}
>>> >> ++		else {
>>> >> ++			ioapic->irq_eoi[i] = 0;
>>> >> ++		}
>>>> >> > 	}
>>>> >> > }
>>> >> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, 
>>> >> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
>>> >> and not too long, ioapic->irq_eoi will reach to 100.
>>> >> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
>>> >> Any ideas?
>>> >> 
>>> >> Zhang Haoyu
>> >
>> >I'm a bit concerned how this will affect realtime guests.
>> >Worth adding a flag to enable this, so that e.g. virtio is not
>> >affected?
>> >
> Your concern is reasonable.
> If applying Jason's original patch, sometimes the virtio's interrupt delay is more than 4ms(my host's HZ=250), 
> but very rarely happened.
> And with my above change on it(per irq counter instead of total irq counter), the delayed virtio interrupt is more rarely happened,
> then I use 1000 instead of 100 on "if (ioapic->irq_eoi[i] == 1000)",  I made a test for 10min, the delayed virtio interrupt has not happened.
>
> Thanks,
> Zhang Haoyu
>

I agree 100 is too aggressive here. Probably you may use a number even
much higher than 1000.

One more thing, may worth to add a tracepoint also if we really want this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseof its corresponding ioapic->irr bit always set
  2014-08-25  7:17   ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof " Zhang Haoyu
  2014-08-25  7:29     ` Jason Wang
@ 2014-08-25  7:32     ` Jason Wang
  1 sibling, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-08-25  7:32 UTC (permalink / raw)
  To: Zhang Haoyu, qemu-devel, kvm

On 08/25/2014 03:17 PM, Zhang Haoyu wrote:
>>> Hi, all
>>> >>
>>> >> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC interrupt storm, 
>>> >> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always true in __kvm_ioapic_update_eoi().
>>> >>
>>> >> Any ideas?
>> >
>> >We meet this several times: search the autoneg patches for an example of
>> >workaround for this in qemu, and patch kvm: ioapic: conditionally delay
>> >irq delivery during eoi broadcast for an workaround in kvm (rejected).
>> >
> Thanks, Jason,
> I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, 
> http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007
> http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765
> http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351
> which one tries to fix this problem, or all of them?
>
>> >That was probably caused by something wrong in e1000 emulation which
>> >causes interrupt to be injected into windows guest before its interrupt
>> >handler is registered. And Windows guest does not have a mechanism to
>> >detect and disable irq in such condition.
>> >
> Sorry, I don't understand,
> I think one interrupt should not been enabled before its handler is successfully registered, 
> is it possible that e1000 emulation inject the interrupt before the interrupt is succesfully enabled?

There's no way for qemu to know whether or not the irq handler was
registered in guest. So if qemu behaves differently with a physical
card, it may lead the interrupt to be injected into guest too early. You
can search redhat bugzilla for lots of related bugs, some even with
in-depth analysis.

Thanks
>
> Thanks,
> Zhang Haoyu
>  

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-09-04  4:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-23 10:36 [Qemu-devel] [question] e1000 interrupt storm happened because of its corresponding ioapic->irr bit always set Zhang Haoyu
2014-08-25  3:07 ` Jason Wang
2014-08-25  7:17   ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof " Zhang Haoyu
2014-08-25  7:29     ` Jason Wang
2014-08-25  8:27       ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof its correspondingioapic->irr " Zhang Haoyu
2014-08-26  9:28       ` Zhang Haoyu
2014-08-27  5:09         ` Jason Wang
2014-08-27  9:31           ` [Qemu-devel] [question] e1000 interrupt storm happened becauseofits " Zhang Haoyu
2014-08-28  7:12             ` Jason Wang
2014-08-28 12:55             ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits " Zhang Haoyu
2014-08-29  2:50               ` Jason Wang
2014-08-29  3:14               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr " Zhang Haoyu
2014-08-29  4:07                 ` Zhang, Yang Z
2014-08-29  4:28                   ` Jason Wang
2014-09-02 15:44               ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr " Michael S. Tsirkin
2014-09-04  1:56                 ` [Qemu-devel] [question] e1000 interrupt stormhappenedbecauseofits " Zhang Haoyu
2014-09-04  4:57                   ` Jason Wang
2014-08-25  7:32     ` [Qemu-devel] [question] e1000 interrupt storm happened becauseof its corresponding ioapic->irr " Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).