* [Qemu-devel] Network connections stalling (due to lost interrupts/ticks?)
@ 2007-08-02 15:56 Charles Duffy
2007-08-02 22:06 ` [Qemu-devel] " Charles Duffy
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Charles Duffy @ 2007-08-02 15:56 UTC (permalink / raw)
To: qemu-devel
I'm trying to use qemu to test an install process which involves quite a
bit of downloading. Everything starts up fine (using either ne2k_pci or
rtl8139 hardware), but the multi-GB download typically stalls out about
100-400MB in. Is there anything I can do to prevent this?
There's a warning on startup that the system can't set a 1024Hz timer,
which persists even after I set /proc/sys/dev/rtc/max-user-freq to 1024,
and I occasionally get warnings at runtime ("Your time source seems to
be instable or some driver is hogging interrupts").
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
2007-08-02 15:56 [Qemu-devel] Network connections stalling (due to lost interrupts/ticks?) Charles Duffy
@ 2007-08-02 22:06 ` Charles Duffy
2007-08-03 12:18 ` Jason Wessel
2007-08-03 1:48 ` [Qemu-devel] Re: Network connections stalling Charles Duffy
2007-08-03 2:28 ` Charles Duffy
2 siblings, 1 reply; 10+ messages in thread
From: Charles Duffy @ 2007-08-02 22:06 UTC (permalink / raw)
To: qemu-devel
Charles Duffy wrote:
> There's a warning on startup that the system can't set a 1024Hz timer,
> which persists even after I set /proc/sys/dev/rtc/max-user-freq to 1024,
> and I occasionally get warnings at runtime ("Your time source seems to
> be instable or some driver is hogging interrupts").
This was happening because my host kernel was compiled with
CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
rebooted, and it resolved the RTC warning (and apparently, the unstable
time source messages) -- but my network connections are still stalling.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] Re: Network connections stalling
2007-08-02 15:56 [Qemu-devel] Network connections stalling (due to lost interrupts/ticks?) Charles Duffy
2007-08-02 22:06 ` [Qemu-devel] " Charles Duffy
@ 2007-08-03 1:48 ` Charles Duffy
2007-08-03 2:28 ` Charles Duffy
2 siblings, 0 replies; 10+ messages in thread
From: Charles Duffy @ 2007-08-03 1:48 UTC (permalink / raw)
To: qemu-devel
This loss of connectivity is still happening; at least within my
environment, it's quite reproducible. While I've eliminated the message
about being unable to set the RTC to 1024Hz by recompiling my host
kernel without CONFIG_HPET_RTC_IRQ, the host's kernel is frequently
complaining "rtc: lost some interrupts at 1024Hz".
I'm using tap-based networking (-net nic -net
tap,ifname=tap0,script=./tap0-start), where tap0 is bridged to my local
ethernet device, and outgoing network traffic from the qemu instance is
visible if I monitor traffic on the bridge.
Indeed, external responses to such traffic (ie. ARP responses) are
visible on the bridge, but evidently not to the virtual machine: the TX
counter rises when I try to ping the outside world, but the RX counter
does not. Resetting the device (ifconfig eth0 down; ifconfig eth0 up)
brings it back alive -- but this isn't an action which can trivially be
slotted into a trivial, single-threaded install script.
These same scripts have been tested on real hardware, and have not
turned up this issue. Certainly, a race condition unrelated to qemu is
possible, but I'd appreciate any insight 'yall can provide. Thank you!
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] Re: Network connections stalling
2007-08-02 15:56 [Qemu-devel] Network connections stalling (due to lost interrupts/ticks?) Charles Duffy
2007-08-02 22:06 ` [Qemu-devel] " Charles Duffy
2007-08-03 1:48 ` [Qemu-devel] Re: Network connections stalling Charles Duffy
@ 2007-08-03 2:28 ` Charles Duffy
2 siblings, 0 replies; 10+ messages in thread
From: Charles Duffy @ 2007-08-03 2:28 UTC (permalink / raw)
To: qemu-devel
This appears to only happen with -net tap; I cannot reproduce the issue
with -net user.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
2007-08-02 22:06 ` [Qemu-devel] " Charles Duffy
@ 2007-08-03 12:18 ` Jason Wessel
2007-08-03 19:48 ` Charles Duffy
2007-08-06 19:48 ` Charles Duffy
0 siblings, 2 replies; 10+ messages in thread
From: Jason Wessel @ 2007-08-03 12:18 UTC (permalink / raw)
To: charles, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]
Charles,
Are you willing to try an experimental patch?
Perhaps you could try the attached patch and post back if it happens to
solve your problem. There is most definitely a problem where qemu can
get hung up indefinitely after an "interrupt storm". I had not ever
submitted it because there is no clean way to do this via the opaque
information that is passed around. It seems wrong to have to make the
ioapic a global. If this does fix the problem perhaps someone will
decide to fix this up in a cleaner fashion via the opaque structures.
Jason.
Charles Duffy wrote:
> Charles Duffy wrote:
>
>> There's a warning on startup that the system can't set a 1024Hz timer,
>> which persists even after I set /proc/sys/dev/rtc/max-user-freq to 1024,
>> and I occasionally get warnings at runtime ("Your time source seems to
>> be instable or some driver is hogging interrupts").
>>
>
> This was happening because my host kernel was compiled with
> CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
> rebooted, and it resolved the RTC warning (and apparently, the unstable
> time source messages) -- but my network connections are still stalling.
>
>
>
>
[-- Attachment #2: io_apic_eoi_fix.patch --]
[-- Type: text/x-patch, Size: 1706 bytes --]
Recover from an interupt flood by propagating the end of interrupt state.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
---
hw/apic.c | 23 +++++++++++++++++++++--
hw/pc.c | 2 +-
2 files changed, 22 insertions(+), 3 deletions(-)
Index: qemu/hw/apic.c
===================================================================
--- qemu.orig/hw/apic.c
+++ qemu/hw/apic.c
@@ -332,6 +332,26 @@ static void apic_set_irq(APICState *s, i
apic_update_irq(s);
}
+struct IOAPICState *ioapic;
+/* XXX Multi IOAPIC support */
+static void apic_propogate_eoi(int vector) {
+ uint32_t irr;
+ int pin;
+
+ if ((vector < 0x10) || (vector > 0xfe))
+ return;
+
+ irr = ioapic->irr;
+ while (irr) {
+ pin = ffs_bit(irr);
+ irr &= ~(1 << pin);
+ if ((ioapic->ioredtbl[pin] & 0xff) == vector) {
+ ioapic->irr &= ~(1 << pin);
+ break;
+ }
+ }
+}
+
static void apic_eoi(APICState *s)
{
int isrv;
@@ -339,8 +359,7 @@ static void apic_eoi(APICState *s)
if (isrv < 0)
return;
reset_bit(s->isr, isrv);
- /* XXX: send the EOI packet to the APIC bus to allow the I/O APIC to
- set the remote IRR bit for level triggered interrupts. */
+ apic_propogate_eoi(isrv);
apic_update_irq(s);
}
Index: qemu/hw/pc.c
===================================================================
--- qemu.orig/hw/pc.c
+++ qemu/hw/pc.c
@@ -36,7 +36,7 @@
static fdctrl_t *floppy_controller;
static RTCState *rtc_state;
static PITState *pit;
-static IOAPICState *ioapic;
+extern IOAPICState *ioapic;
static PCIDevice *i440fx_state;
static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
@ 2007-08-03 13:48 n schembr
2007-08-03 15:02 ` Jason Wessel
0 siblings, 1 reply; 10+ messages in thread
From: n schembr @ 2007-08-03 13:48 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]
I'm seeing the same rtc error but my systems are not hanging. I can still get to them and they seem to handle a good load from time to time, 4 running proc.
Is this a stability or performance issue?
If it is a stability issue how do I test it?
----- Original Message ----
From: Jason Wessel <jason.wessel@windriver.com>
To: charles@dyfis.net; qemu-devel@nongnu.org
Sent: Friday, August 3, 2007 8:18:50 AM
Subject: Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
Charles,
Are you willing to try an experimental patch?
Perhaps you could try the attached patch and post back if it happens to
solve your problem. There is most definitely a problem where qemu can
get hung up indefinitely after an "interrupt storm". I had not ever
submitted it because there is no clean way to do this via the opaque
information that is passed around. It seems wrong to have to make the
ioapic a global. If this does fix the problem perhaps someone will
decide to fix this up in a cleaner fashion via the opaque structures.
Jason.
Charles Duffy wrote:
> Charles Duffy wrote:
>
>> There's a warning on startup that the system can't set a 1024Hz timer,
>> which persists even after I set /proc/sys/dev/rtc/max-user-freq to 1024,
>> and I occasionally get warnings at runtime ("Your time source seems to
>> be instable or some driver is hogging interrupts").
>>
>
> This was happening because my host kernel was compiled with
> CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
> rebooted, and it resolved the RTC warning (and apparently, the unstable
> time source messages) -- but my network connections are still stalling.
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 2352 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
2007-08-03 13:48 [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?) n schembr
@ 2007-08-03 15:02 ` Jason Wessel
2007-08-04 2:48 ` Luke -Jr
0 siblings, 1 reply; 10+ messages in thread
From: Jason Wessel @ 2007-08-03 15:02 UTC (permalink / raw)
To: qemu-devel
The RTC message has nothing to do with the interrupt controller load.
The patch I mentioned was aimed at stability/bug fix. Nothing to do
with performance what so ever.
The simple test that you can usually break the qemu interrupt controller
with is to do a "ping -f" to the target when using TAP. Then just run
some other processes on the target or try to use the network with telnet
or write to the disk with echo file > blah ; sync... It usually doesn't
last too long. It is the "ping -f" that will keep the interrupt load
at the max.
Jason.
n schembr wrote:
> I'm seeing the same rtc error but my systems are not hanging. I can
> still get to them and they seem to handle a good load from time to
> time, 4 running proc.
>
> Is this a stability or performance issue?
>
> If it is a stability issue how do I test it?
>
> ----- Original Message ----
> From: Jason Wessel <jason.wessel@windriver.com>
> To: charles@dyfis.net; qemu-devel@nongnu.org
> Sent: Friday, August 3, 2007 8:18:50 AM
> Subject: Re: [Qemu-devel] Re: Network connections stalling (due to
> lost interrupts/ticks?)
>
> Charles,
>
> Are you willing to try an experimental patch?
>
> Perhaps you could try the attached patch and post back if it happens to
> solve your problem. There is most definitely a problem where qemu can
> get hung up indefinitely after an "interrupt storm". I had not ever
> submitted it because there is no clean way to do this via the opaque
> information that is passed around. It seems wrong to have to make the
> ioapic a global. If this does fix the problem perhaps someone will
> decide to fix this up in a cleaner fashion via the opaque structures.
>
> Jason.
>
> Charles Duffy wrote:
> > Charles Duffy wrote:
> >
> >> There's a warning on startup that the system can't set a 1024Hz timer,
> >> which persists even after I set /proc/sys/dev/rtc/max-user-freq to
> 1024,
> >> and I occasionally get warnings at runtime ("Your time source seems to
> >> be instable or some driver is hogging interrupts").
> >>
> >
> > This was happening because my host kernel was compiled with
> > CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
> > rebooted, and it resolved the RTC warning (and apparently, the unstable
> > time source messages) -- but my network connections are still stalling.
> >
> >
> >
> >
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
2007-08-03 12:18 ` Jason Wessel
@ 2007-08-03 19:48 ` Charles Duffy
2007-08-06 19:48 ` Charles Duffy
1 sibling, 0 replies; 10+ messages in thread
From: Charles Duffy @ 2007-08-03 19:48 UTC (permalink / raw)
To: qemu-devel
Well, behavior with the patch applied is certainly different.
The large download I'm running still times out; however, it is now able
to resume without needing to bring the interface down and back up.
However, after the first timeout, subsequent timeouts occur with much
greater frequency -- still making this multi-GB download an
impracticality when using -net tap.
The flood ping is not killing the network connection, though it is
interrupted by frequent messages: "Warning: time of day goes back
(-23150us), taking countermeasures". (This is no the high end of the
time variances shown; the smallest are on the scale of 120us
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
2007-08-03 15:02 ` Jason Wessel
@ 2007-08-04 2:48 ` Luke -Jr
0 siblings, 0 replies; 10+ messages in thread
From: Luke -Jr @ 2007-08-04 2:48 UTC (permalink / raw)
To: qemu-devel
FWIW, I suspect this problem is specific to tap+bridge+qemu combination.
On Friday 03 August 2007 15:02, Jason Wessel wrote:
> The RTC message has nothing to do with the interrupt controller load.
>
> The patch I mentioned was aimed at stability/bug fix. Nothing to do
> with performance what so ever.
>
> The simple test that you can usually break the qemu interrupt controller
> with is to do a "ping -f" to the target when using TAP. Then just run
> some other processes on the target or try to use the network with telnet
> or write to the disk with echo file > blah ; sync... It usually doesn't
> last too long. It is the "ping -f" that will keep the interrupt load
> at the max.
>
> Jason.
>
> n schembr wrote:
> > I'm seeing the same rtc error but my systems are not hanging. I can
> > still get to them and they seem to handle a good load from time to
> > time, 4 running proc.
> >
> > Is this a stability or performance issue?
> >
> > If it is a stability issue how do I test it?
> >
> > ----- Original Message ----
> > From: Jason Wessel <jason.wessel@windriver.com>
> > To: charles@dyfis.net; qemu-devel@nongnu.org
> > Sent: Friday, August 3, 2007 8:18:50 AM
> > Subject: Re: [Qemu-devel] Re: Network connections stalling (due to
> > lost interrupts/ticks?)
> >
> > Charles,
> >
> > Are you willing to try an experimental patch?
> >
> > Perhaps you could try the attached patch and post back if it happens to
> > solve your problem. There is most definitely a problem where qemu can
> > get hung up indefinitely after an "interrupt storm". I had not ever
> > submitted it because there is no clean way to do this via the opaque
> > information that is passed around. It seems wrong to have to make the
> > ioapic a global. If this does fix the problem perhaps someone will
> > decide to fix this up in a cleaner fashion via the opaque structures.
> >
> > Jason.
> >
> > Charles Duffy wrote:
> > > Charles Duffy wrote:
> > >> There's a warning on startup that the system can't set a 1024Hz timer,
> > >> which persists even after I set /proc/sys/dev/rtc/max-user-freq to
> >
> > 1024,
> >
> > >> and I occasionally get warnings at runtime ("Your time source seems to
> > >> be instable or some driver is hogging interrupts").
> > >
> > > This was happening because my host kernel was compiled with
> > > CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
> > > rebooted, and it resolved the RTC warning (and apparently, the unstable
> > > time source messages) -- but my network connections are still stalling.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)
2007-08-03 12:18 ` Jason Wessel
2007-08-03 19:48 ` Charles Duffy
@ 2007-08-06 19:48 ` Charles Duffy
1 sibling, 0 replies; 10+ messages in thread
From: Charles Duffy @ 2007-08-06 19:48 UTC (permalink / raw)
To: qemu-devel
[Resending as the last copy seems to have been lost in the ether -- it
certainly isn't in the GMANE archive]
Well, behavior with the patch applied is certainly different.
The large download I'm running still times out; however, it is now able
to resume without needing to bring the interface down and back up.
However, after the first timeout, subsequent timeouts occur with much
greater frequency -- making this multi-GB download still an
impracticality when using -net tap.
The flood ping is not killing the network connection, though it is
interrupted by frequent messages: "Warning: time of day goes back
(-23150us), taking countermeasures". (This is no the high end of the
time variances shown; the smallest are on the scale of 120us).
[btw -- Wind River? heh -- 'yall were The Great Enemy back when I was at
MontaVista around 2000-2002 or so]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-08-06 19:49 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-02 15:56 [Qemu-devel] Network connections stalling (due to lost interrupts/ticks?) Charles Duffy
2007-08-02 22:06 ` [Qemu-devel] " Charles Duffy
2007-08-03 12:18 ` Jason Wessel
2007-08-03 19:48 ` Charles Duffy
2007-08-06 19:48 ` Charles Duffy
2007-08-03 1:48 ` [Qemu-devel] Re: Network connections stalling Charles Duffy
2007-08-03 2:28 ` Charles Duffy
-- strict thread matches above, loose matches on Subject: below --
2007-08-03 13:48 [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?) n schembr
2007-08-03 15:02 ` Jason Wessel
2007-08-04 2:48 ` Luke -Jr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).