* network shutdown under heavy load
@ 2009-12-14 15:49 rek2
2009-12-16 12:17 ` Avi Kivity
0 siblings, 1 reply; 20+ messages in thread
From: rek2 @ 2009-12-14 15:49 UTC (permalink / raw)
To: kvm
Hello, we notice that when we stress any of our guests, in this case
they are all fedora, the KVM network will shutdown.. anyone experience this?
Thanks
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2009-12-14 15:49 network shutdown under heavy load rek2
@ 2009-12-16 12:17 ` Avi Kivity
2009-12-16 13:19 ` Herbert Xu
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-12-16 12:17 UTC (permalink / raw)
To: rek2; +Cc: kvm, Herbert Xu
On 12/14/2009 05:49 PM, rek2 wrote:
> Hello, we notice that when we stress any of our guests, in this case
> they are all fedora, the KVM network will shutdown.. anyone experience
> this?
>
Herbert?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2009-12-16 12:17 ` Avi Kivity
@ 2009-12-16 13:19 ` Herbert Xu
2009-12-17 18:15 ` rek2
0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2009-12-16 13:19 UTC (permalink / raw)
To: Avi Kivity; +Cc: rek2, kvm
On Wed, Dec 16, 2009 at 02:17:04PM +0200, Avi Kivity wrote:
> On 12/14/2009 05:49 PM, rek2 wrote:
>> Hello, we notice that when we stress any of our guests, in this case
>> they are all fedora, the KVM network will shutdown.. anyone experience
>> this?
>
> Herbert?
What's the exact guest kernel version? When the network is down,
please get onto the guest console to determine which direction
(if not both) of the network is not functioning.
You can run tcpdump in the guest/host and execute pings on both
sides to see which direction is blocked.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2009-12-16 13:19 ` Herbert Xu
@ 2009-12-17 18:15 ` rek2
2009-12-18 1:27 ` Herbert Xu
0 siblings, 1 reply; 20+ messages in thread
From: rek2 @ 2009-12-17 18:15 UTC (permalink / raw)
To: Herbert Xu; +Cc: Avi Kivity, kvm, Misha Pivovarov
> What's the exact guest kernel version? When the network is down,
> please get onto the guest console to determine which direction
> (if not both) of the network is not functioning.
>
> You can run tcpdump in the guest/host and execute pings on both
> sides to see which direction is blocked.
>
> Cheers,
>
on the hosts:
uname -a
Linux XXXX 2.6.31-16-server #53-Ubuntu SMP Tue Dec 8 05:08:02 UTC 2009
x86_64 GNU/Linux
I been told that today the network when down again and one of the guys
here had to log using the console and restart it for that particular
guests..
on the guest:
uname -a
Linux XXXX 2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT
2009 x86_64 x86_64 x86_64 GNU/Linux
Next time it goes down I will try to run a sniffer and try both sides.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2009-12-17 18:15 ` rek2
@ 2009-12-18 1:27 ` Herbert Xu
2009-12-21 16:39 ` rek2
0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2009-12-18 1:27 UTC (permalink / raw)
To: rek2; +Cc: Avi Kivity, kvm, Misha Pivovarov
On Thu, Dec 17, 2009 at 01:15:46PM -0500, rek2 wrote:
>
> I been told that today the network when down again and one of the guys
> here had to log using the console and restart it for that particular
> guests..
>
> on the guest:
> uname -a
> Linux XXXX 2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT
> 2009 x86_64 x86_64 x86_64 GNU/Linux
>
> Next time it goes down I will try to run a sniffer and try both sides.
OK I'm fairly sure this version has a buggy virtio-net. Does
this patch (if it applies :) help?
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9eec5a5..74b3854 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -521,8 +521,10 @@ static void xmit_tasklet(unsigned long data)
vi->svq->vq_ops->kick(vi->svq);
vi->last_xmit_skb = NULL;
}
- if (vi->free_in_tasklet)
+ if (vi->free_in_tasklet) {
free_old_xmit_skbs(vi);
+ netif_wake_queue(vi->dev);
+ }
netif_tx_unlock_bh(vi->dev);
}
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2009-12-18 1:27 ` Herbert Xu
@ 2009-12-21 16:39 ` rek2
2010-01-07 17:02 ` rek2
0 siblings, 1 reply; 20+ messages in thread
From: rek2 @ 2009-12-21 16:39 UTC (permalink / raw)
To: Herbert Xu; +Cc: Avi Kivity, kvm
You say this version.. is there a newer version with this patch already
apply to it?
Thanks
On 12/17/09 20:27 p.m., Herbert Xu wrote:
> On Thu, Dec 17, 2009 at 01:15:46PM -0500, rek2 wrote:
>
>> I been told that today the network when down again and one of the guys
>> here had to log using the console and restart it for that particular
>> guests..
>>
>> on the guest:
>> uname -a
>> Linux XXXX 2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT
>> 2009 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Next time it goes down I will try to run a sniffer and try both sides.
>>
> OK I'm fairly sure this version has a buggy virtio-net. Does
> this patch (if it applies :) help?
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9eec5a5..74b3854 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -521,8 +521,10 @@ static void xmit_tasklet(unsigned long data)
> vi->svq->vq_ops->kick(vi->svq);
> vi->last_xmit_skb = NULL;
> }
> - if (vi->free_in_tasklet)
> + if (vi->free_in_tasklet) {
> free_old_xmit_skbs(vi);
> + netif_wake_queue(vi->dev);
> + }
> netif_tx_unlock_bh(vi->dev);
> }
>
> Cheers,
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2009-12-21 16:39 ` rek2
@ 2010-01-07 17:02 ` rek2
2010-01-10 12:30 ` Avi Kivity
0 siblings, 1 reply; 20+ messages in thread
From: rek2 @ 2010-01-07 17:02 UTC (permalink / raw)
To: Herbert Xu; +Cc: Avi Kivity, kvm
Hi guys, it happen again (in this server I didn't patch with the fix you
guys sent) but I did this so if it happen i can test with tcpdump..
seems that the guest can receive packages but can't sent...
when I open a tcpdump I saw traffic coming in, but not out.
Hope this helps..
also I need to know if the patch you guys sent me will be in newer
versions, if not I like to know since I can't update.
On 12/21/09 11:39 a.m., rek2 wrote:
> You say this version.. is there a newer version with this patch
> already apply to it?
>
> Thanks
>
>
>
> On 12/17/09 20:27 p.m., Herbert Xu wrote:
>> On Thu, Dec 17, 2009 at 01:15:46PM -0500, rek2 wrote:
>>> I been told that today the network when down again and one of the guys
>>> here had to log using the console and restart it for that particular
>>> guests..
>>>
>>> on the guest:
>>> uname -a
>>> Linux XXXX 2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34
>>> EDT
>>> 2009 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Next time it goes down I will try to run a sniffer and try both sides.
>> OK I'm fairly sure this version has a buggy virtio-net. Does
>> this patch (if it applies :) help?
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 9eec5a5..74b3854 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -521,8 +521,10 @@ static void xmit_tasklet(unsigned long data)
>> vi->svq->vq_ops->kick(vi->svq);
>> vi->last_xmit_skb = NULL;
>> }
>> - if (vi->free_in_tasklet)
>> + if (vi->free_in_tasklet) {
>> free_old_xmit_skbs(vi);
>> + netif_wake_queue(vi->dev);
>> + }
>> netif_tx_unlock_bh(vi->dev);
>> }
>>
>> Cheers,
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-07 17:02 ` rek2
@ 2010-01-10 12:30 ` Avi Kivity
2010-01-10 12:35 ` Herbert Xu
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2010-01-10 12:30 UTC (permalink / raw)
To: rek2; +Cc: Herbert Xu, kvm
On 01/07/2010 07:02 PM, rek2 wrote:
> Hi guys, it happen again (in this server I didn't patch with the fix
> you guys sent) but I did this so if it happen i can test with
> tcpdump.. seems that the guest can receive packages but can't sent...
> when I open a tcpdump I saw traffic coming in, but not out.
>
> Hope this helps..
> also I need to know if the patch you guys sent me will be in newer
> versions, if not I like to know since I can't update.
>
This isn't in 2.6.27.y. Herbert, can you send it there?
Your Fedora 10 kernel isn't supported any more, though.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-10 12:30 ` Avi Kivity
@ 2010-01-10 12:35 ` Herbert Xu
2010-01-10 12:38 ` Avi Kivity
0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2010-01-10 12:35 UTC (permalink / raw)
To: Avi Kivity; +Cc: rek2, kvm
On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
>
> This isn't in 2.6.27.y. Herbert, can you send it there?
It appears that now that TX is fixed we have a similar problem
with RX. Once I figure that one out I'll send them together.
Who is maintaining that BTW, stable@kernel.org?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-10 12:35 ` Herbert Xu
@ 2010-01-10 12:38 ` Avi Kivity
2010-01-13 19:13 ` Tom Lendacky
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2010-01-10 12:38 UTC (permalink / raw)
To: Herbert Xu; +Cc: rek2, kvm
On 01/10/2010 02:35 PM, Herbert Xu wrote:
> On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
>
>> This isn't in 2.6.27.y. Herbert, can you send it there?
>>
> It appears that now that TX is fixed we have a similar problem
> with RX. Once I figure that one out I'll send them together.
>
Thanks.
> Who is maintaining that BTW, stable@kernel.org?
>
Yes.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-10 12:38 ` Avi Kivity
@ 2010-01-13 19:13 ` Tom Lendacky
2010-01-13 21:52 ` Chris Wright
0 siblings, 1 reply; 20+ messages in thread
From: Tom Lendacky @ 2010-01-13 19:13 UTC (permalink / raw)
To: Avi Kivity; +Cc: Herbert Xu, rek2, kvm
On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
> On 01/10/2010 02:35 PM, Herbert Xu wrote:
> > On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
> >> This isn't in 2.6.27.y. Herbert, can you send it there?
> >
> > It appears that now that TX is fixed we have a similar problem
> > with RX. Once I figure that one out I'll send them together.
>
I've been experiencing the network shutdown issue also. I've been running
netperf tests across 10GbE adapters with Qemu 0.12.1.2, RHEL5.4 guests and
2.6.32 kernel (from kvm.git) guests. I instrumented Qemu to print out some
network statistics. It appears that at some point in the netperf test the
receiving guest ends up having the 10GbE device "receive_disabled" variable in
its VLANClientState structure stuck at 1. From looking at the code it appears
that the virtio-net driver in the guest should cause qemu_flush_queued_packets
in net.c to eventually run and clear the "receive_disabled" variable but it's
not happening. I don't seem to have these issues when I have a lot of debug
settings active in the guest kernel which results in very low/poor network
performance - maybe some kind of race condition?
Tom
> Thanks.
>
> > Who is maintaining that BTW, stable@kernel.org?
>
> Yes.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-13 19:13 ` Tom Lendacky
@ 2010-01-13 21:52 ` Chris Wright
2010-01-19 21:29 ` Tom Lendacky
0 siblings, 1 reply; 20+ messages in thread
From: Chris Wright @ 2010-01-13 21:52 UTC (permalink / raw)
To: Tom Lendacky; +Cc: Avi Kivity, Herbert Xu, rek2, kvm, markmc
(Mark cc'd, sound familiar?)
* Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
> > On 01/10/2010 02:35 PM, Herbert Xu wrote:
> > > On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
> > >> This isn't in 2.6.27.y. Herbert, can you send it there?
> > >
> > > It appears that now that TX is fixed we have a similar problem
> > > with RX. Once I figure that one out I'll send them together.
> >
>
> I've been experiencing the network shutdown issue also. I've been running
> netperf tests across 10GbE adapters with Qemu 0.12.1.2, RHEL5.4 guests and
> 2.6.32 kernel (from kvm.git) guests. I instrumented Qemu to print out some
> network statistics. It appears that at some point in the netperf test the
> receiving guest ends up having the 10GbE device "receive_disabled" variable in
> its VLANClientState structure stuck at 1. From looking at the code it appears
> that the virtio-net driver in the guest should cause qemu_flush_queued_packets
> in net.c to eventually run and clear the "receive_disabled" variable but it's
> not happening. I don't seem to have these issues when I have a lot of debug
> settings active in the guest kernel which results in very low/poor network
> performance - maybe some kind of race condition?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-13 21:52 ` Chris Wright
@ 2010-01-19 21:29 ` Tom Lendacky
2010-01-19 23:57 ` Chris Wright
0 siblings, 1 reply; 20+ messages in thread
From: Tom Lendacky @ 2010-01-19 21:29 UTC (permalink / raw)
To: Chris Wright; +Cc: Avi Kivity, Herbert Xu, rek2, kvm, markmc
On Wednesday 13 January 2010 03:52:28 pm Chris Wright wrote:
> (Mark cc'd, sound familiar?)
>
> * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> > On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
> > > On 01/10/2010 02:35 PM, Herbert Xu wrote:
> > > > On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
> > > >> This isn't in 2.6.27.y. Herbert, can you send it there?
> > > >
> > > > It appears that now that TX is fixed we have a similar problem
> > > > with RX. Once I figure that one out I'll send them together.
> >
> > I've been experiencing the network shutdown issue also. I've been
> > running netperf tests across 10GbE adapters with Qemu 0.12.1.2, RHEL5.4
> > guests and 2.6.32 kernel (from kvm.git) guests. I instrumented Qemu to
> > print out some network statistics. It appears that at some point in the
> > netperf test the receiving guest ends up having the 10GbE device
> > "receive_disabled" variable in its VLANClientState structure stuck at 1.
> > From looking at the code it appears that the virtio-net driver in the
> > guest should cause qemu_flush_queued_packets in net.c to eventually run
> > and clear the "receive_disabled" variable but it's not happening. I
> > don't seem to have these issues when I have a lot of debug settings
> > active in the guest kernel which results in very low/poor network
> > performance - maybe some kind of race condition?
>
Ok, here's an update. After realizing that none of the ethtool offload options
were enabled in my guest, I found that I needed to be using the -netdev option
on the qemu command line. Once I did that, some ethtool offload options were
enabled and the deadlock did not appear when I did networking between guests
on different machines. However, the deadlock did appear when I did networking
between guests on the same machine.
Tom
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-19 21:29 ` Tom Lendacky
@ 2010-01-19 23:57 ` Chris Wright
2010-01-20 15:48 ` Tom Lendacky
0 siblings, 1 reply; 20+ messages in thread
From: Chris Wright @ 2010-01-19 23:57 UTC (permalink / raw)
To: Tom Lendacky; +Cc: Chris Wright, Avi Kivity, Herbert Xu, rek2, kvm, markmc
* Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> On Wednesday 13 January 2010 03:52:28 pm Chris Wright wrote:
> > (Mark cc'd, sound familiar?)
> >
> > * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> > > On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
> > > > On 01/10/2010 02:35 PM, Herbert Xu wrote:
> > > > > On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
> > > > >> This isn't in 2.6.27.y. Herbert, can you send it there?
> > > > >
> > > > > It appears that now that TX is fixed we have a similar problem
> > > > > with RX. Once I figure that one out I'll send them together.
> > >
> > > I've been experiencing the network shutdown issue also. I've been
> > > running netperf tests across 10GbE adapters with Qemu 0.12.1.2, RHEL5.4
> > > guests and 2.6.32 kernel (from kvm.git) guests. I instrumented Qemu to
> > > print out some network statistics. It appears that at some point in the
> > > netperf test the receiving guest ends up having the 10GbE device
> > > "receive_disabled" variable in its VLANClientState structure stuck at 1.
> > > From looking at the code it appears that the virtio-net driver in the
> > > guest should cause qemu_flush_queued_packets in net.c to eventually run
> > > and clear the "receive_disabled" variable but it's not happening. I
> > > don't seem to have these issues when I have a lot of debug settings
> > > active in the guest kernel which results in very low/poor network
> > > performance - maybe some kind of race condition?
>
> Ok, here's an update. After realizing that none of the ethtool offload options
> were enabled in my guest, I found that I needed to be using the -netdev option
> on the qemu command line. Once I did that, some ethtool offload options were
> enabled and the deadlock did not appear when I did networking between guests
> on different machines. However, the deadlock did appear when I did networking
> between guests on the same machine.
What does your full command line look like? And when the networking
stops does your same receive_disabled hack make things work?
thanks,
-chris
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-19 23:57 ` Chris Wright
@ 2010-01-20 15:48 ` Tom Lendacky
2010-01-26 21:59 ` Tom Lendacky
0 siblings, 1 reply; 20+ messages in thread
From: Tom Lendacky @ 2010-01-20 15:48 UTC (permalink / raw)
To: Chris Wright; +Cc: Avi Kivity, Herbert Xu, rek2, kvm, markmc
On Tuesday 19 January 2010 05:57:53 pm Chris Wright wrote:
> * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> > On Wednesday 13 January 2010 03:52:28 pm Chris Wright wrote:
> > > (Mark cc'd, sound familiar?)
> > >
> > > * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> > > > On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
> > > > > On 01/10/2010 02:35 PM, Herbert Xu wrote:
> > > > > > On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
> > > > > >> This isn't in 2.6.27.y. Herbert, can you send it there?
> > > > > >
> > > > > > It appears that now that TX is fixed we have a similar problem
> > > > > > with RX. Once I figure that one out I'll send them together.
> > > >
> > > > I've been experiencing the network shutdown issue also. I've been
> > > > running netperf tests across 10GbE adapters with Qemu 0.12.1.2,
> > > > RHEL5.4 guests and 2.6.32 kernel (from kvm.git) guests. I
> > > > instrumented Qemu to print out some network statistics. It appears
> > > > that at some point in the netperf test the receiving guest ends up
> > > > having the 10GbE device "receive_disabled" variable in its
> > > > VLANClientState structure stuck at 1. From looking at the code it
> > > > appears that the virtio-net driver in the guest should cause
> > > > qemu_flush_queued_packets in net.c to eventually run and clear the
> > > > "receive_disabled" variable but it's not happening. I don't seem to
> > > > have these issues when I have a lot of debug settings active in the
> > > > guest kernel which results in very low/poor network performance -
> > > > maybe some kind of race condition?
> >
> > Ok, here's an update. After realizing that none of the ethtool offload
> > options were enabled in my guest, I found that I needed to be using the
> > -netdev option on the qemu command line. Once I did that, some ethtool
> > offload options were enabled and the deadlock did not appear when I did
> > networking between guests on different machines. However, the deadlock
> > did appear when I did networking between guests on the same machine.
>
> What does your full command line look like? And when the networking
> stops does your same receive_disabled hack make things work?
The command line when using the -net option for the tap device is:
/usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
file=/autobench/var/tmp/cape-vm001-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51 -net
tap,vlan=0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1 -net
tap,vlan=1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
telnet::5701,server,nowait -snapshot -daemonize
when using the -netdev option for the tap device:
/usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
file=/autobench/var/tmp/cape-vm001-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
telnet::5701,server,nowait -snapshot -daemonize
The first ethernet device is a 1GbE device for communicating with the
automation infrastructure we have. The second ethernet device is the 10GbE
device that the netperf tests run on.
I can get the networking to work again by bringing down the interfaces and
reloading the virtio_net module (modprobe -r virtio_net / modprobe
virtio_net).
I haven't had a chance yet to run the tests against a modified version of qemu
that does not set the receive_disabled variable.
Tom
>
> thanks,
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-20 15:48 ` Tom Lendacky
@ 2010-01-26 21:59 ` Tom Lendacky
2010-02-09 20:03 ` Jean-Philippe Menil
0 siblings, 1 reply; 20+ messages in thread
From: Tom Lendacky @ 2010-01-26 21:59 UTC (permalink / raw)
To: Chris Wright; +Cc: Avi Kivity, Herbert Xu, rek2, kvm, markmc
On Wednesday 20 January 2010 09:48:04 am Tom Lendacky wrote:
> On Tuesday 19 January 2010 05:57:53 pm Chris Wright wrote:
> > * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> > > On Wednesday 13 January 2010 03:52:28 pm Chris Wright wrote:
> > > > (Mark cc'd, sound familiar?)
> > > >
> > > > * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
> > > > > On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
> > > > > > On 01/10/2010 02:35 PM, Herbert Xu wrote:
> > > > > > > On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
> > > > > > >> This isn't in 2.6.27.y. Herbert, can you send it there?
> > > > > > >
> > > > > > > It appears that now that TX is fixed we have a similar problem
> > > > > > > with RX. Once I figure that one out I'll send them together.
> > > > >
> > > > > I've been experiencing the network shutdown issue also. I've been
> > > > > running netperf tests across 10GbE adapters with Qemu 0.12.1.2,
> > > > > RHEL5.4 guests and 2.6.32 kernel (from kvm.git) guests. I
> > > > > instrumented Qemu to print out some network statistics. It appears
> > > > > that at some point in the netperf test the receiving guest ends up
> > > > > having the 10GbE device "receive_disabled" variable in its
> > > > > VLANClientState structure stuck at 1. From looking at the code it
> > > > > appears that the virtio-net driver in the guest should cause
> > > > > qemu_flush_queued_packets in net.c to eventually run and clear the
> > > > > "receive_disabled" variable but it's not happening. I don't seem
> > > > > to have these issues when I have a lot of debug settings active in
> > > > > the guest kernel which results in very low/poor network performance
> > > > > - maybe some kind of race condition?
> > >
> > > Ok, here's an update. After realizing that none of the ethtool offload
> > > options were enabled in my guest, I found that I needed to be using the
> > > -netdev option on the qemu command line. Once I did that, some ethtool
> > > offload options were enabled and the deadlock did not appear when I did
> > > networking between guests on different machines. However, the deadlock
> > > did appear when I did networking between guests on the same machine.
> >
> > What does your full command line look like? And when the networking
> > stops does your same receive_disabled hack make things work?
>
> The command line when using the -net option for the tap device is:
>
> /usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
> file=/autobench/var/tmp/cape-vm001-
> raw.img,if=virtio,index=0,media=disk,boot=on -net
> nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51 -net
> tap,vlan=0,script=/autobench/var/tmp/ifup-kvm-
> br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
> nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1 -net
> tap,vlan=1,script=/autobench/var/tmp/ifup-kvm-
> br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
> telnet::5701,server,nowait -snapshot -daemonize
>
> when using the -netdev option for the tap device:
>
> /usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
> file=/autobench/var/tmp/cape-vm001-
> raw.img,if=virtio,index=0,media=disk,boot=on -net
> nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
> netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
> br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
> nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
> netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
> br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
> telnet::5701,server,nowait -snapshot -daemonize
>
>
> The first ethernet device is a 1GbE device for communicating with the
> automation infrastructure we have. The second ethernet device is the 10GbE
> device that the netperf tests run on.
>
> I can get the networking to work again by bringing down the interfaces and
> reloading the virtio_net module (modprobe -r virtio_net / modprobe
> virtio_net).
>
> I haven't had a chance yet to run the tests against a modified version of
> qemu that does not set the receive_disabled variable.
I got a chance to run with the setting of the receive_diabled variable
commented out and I still run into the problem. It's easier to reproduce when
running netperf between two guests on the same machine. I instrumented qemu
and virtio a little bit to try and track this down. What I'm seeing is that,
with two guests on the same machine, the receiving (netserver) guest
eventually gets into a condition where the tap read poll callback is disabled
and never re-enabled. So packets are never delivered from tap to qemu and to
the guest. On the sending (netperf) side the transmit queue eventually runs
out of capacity and it can no longer send packets (I believe this is unique to
having the guests on the same machine). And as before, bringing down the
interfaces, reloading the virtio_net module, and restarting the interfaces
clears things up.
Tom
>
> Tom
>
> > thanks,
> > -chris
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-01-26 21:59 ` Tom Lendacky
@ 2010-02-09 20:03 ` Jean-Philippe Menil
2010-02-09 20:25 ` Chris Wright
0 siblings, 1 reply; 20+ messages in thread
From: Jean-Philippe Menil @ 2010-02-09 20:03 UTC (permalink / raw)
To: Tom Lendacky; +Cc: Chris Wright, Avi Kivity, Herbert Xu, rek2, kvm, markmc
[-- Attachment #1: Type: text/plain, Size: 6773 bytes --]
Tom Lendacky a écrit :
> On Wednesday 20 January 2010 09:48:04 am Tom Lendacky wrote:
>
>> On Tuesday 19 January 2010 05:57:53 pm Chris Wright wrote:
>>
>>> * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
>>>
>>>> On Wednesday 13 January 2010 03:52:28 pm Chris Wright wrote:
>>>>
>>>>> (Mark cc'd, sound familiar?)
>>>>>
>>>>> * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote:
>>>>>
>>>>>> On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
>>>>>>
>>>>>>> On 01/10/2010 02:35 PM, Herbert Xu wrote:
>>>>>>>
>>>>>>>> On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
>>>>>>>>
>>>>>>>>> This isn't in 2.6.27.y. Herbert, can you send it there?
>>>>>>>>>
>>>>>>>> It appears that now that TX is fixed we have a similar problem
>>>>>>>> with RX. Once I figure that one out I'll send them together.
>>>>>>>>
>>>>>> I've been experiencing the network shutdown issue also. I've been
>>>>>> running netperf tests across 10GbE adapters with Qemu 0.12.1.2,
>>>>>> RHEL5.4 guests and 2.6.32 kernel (from kvm.git) guests. I
>>>>>> instrumented Qemu to print out some network statistics. It appears
>>>>>> that at some point in the netperf test the receiving guest ends up
>>>>>> having the 10GbE device "receive_disabled" variable in its
>>>>>> VLANClientState structure stuck at 1. From looking at the code it
>>>>>> appears that the virtio-net driver in the guest should cause
>>>>>> qemu_flush_queued_packets in net.c to eventually run and clear the
>>>>>> "receive_disabled" variable but it's not happening. I don't seem
>>>>>> to have these issues when I have a lot of debug settings active in
>>>>>> the guest kernel which results in very low/poor network performance
>>>>>> - maybe some kind of race condition?
>>>>>>
>>>> Ok, here's an update. After realizing that none of the ethtool offload
>>>> options were enabled in my guest, I found that I needed to be using the
>>>> -netdev option on the qemu command line. Once I did that, some ethtool
>>>> offload options were enabled and the deadlock did not appear when I did
>>>> networking between guests on different machines. However, the deadlock
>>>> did appear when I did networking between guests on the same machine.
>>>>
>>> What does your full command line look like? And when the networking
>>> stops does your same receive_disabled hack make things work?
>>>
>> The command line when using the -net option for the tap device is:
>>
>> /usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
>> file=/autobench/var/tmp/cape-vm001-
>> raw.img,if=virtio,index=0,media=disk,boot=on -net
>> nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51 -net
>> tap,vlan=0,script=/autobench/var/tmp/ifup-kvm-
>> br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
>> nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1 -net
>> tap,vlan=1,script=/autobench/var/tmp/ifup-kvm-
>> br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
>> telnet::5701,server,nowait -snapshot -daemonize
>>
>> when using the -netdev option for the tap device:
>>
>> /usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
>> file=/autobench/var/tmp/cape-vm001-
>> raw.img,if=virtio,index=0,media=disk,boot=on -net
>> nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
>> netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
>> br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
>> nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
>> netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
>> br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
>> telnet::5701,server,nowait -snapshot -daemonize
>>
>>
>> The first ethernet device is a 1GbE device for communicating with the
>> automation infrastructure we have. The second ethernet device is the 10GbE
>> device that the netperf tests run on.
>>
>> I can get the networking to work again by bringing down the interfaces and
>> reloading the virtio_net module (modprobe -r virtio_net / modprobe
>> virtio_net).
>>
>> I haven't had a chance yet to run the tests against a modified version of
>> qemu that does not set the receive_disabled variable.
>>
>
> I got a chance to run with the setting of the receive_diabled variable
> commented out and I still run into the problem. It's easier to reproduce when
> running netperf between two guests on the same machine. I instrumented qemu
> and virtio a little bit to try and track this down. What I'm seeing is that,
> with two guests on the same machine, the receiving (netserver) guest
> eventually gets into a condition where the tap read poll callback is disabled
> and never re-enabled. So packets are never delivered from tap to qemu and to
> the guest. On the sending (netperf) side the transmit queue eventually runs
> out of capacity and it can no longer send packets (I believe this is unique to
> having the guests on the same machine). And as before, bringing down the
> interfaces, reloading the virtio_net module, and restarting the interfaces
> clears things up.
>
> Tom
>
>
>> Tom
>>
>>
>>> thanks,
>>> -chris
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Hi,
it seems, that i encounter the same bug.
I've a guest with an high network load, and after some time, it seems
that there's "no more network".
Under the guest, I can't ping anymore the gateway.
If i restart the guest, everything work fine again.
My environment:
Debian/Squeeze
host2-kvm:~# uname -a
Linux host2-kvm 2.6.33-rc6-git4-jp #3 SMP Thu Feb 4 17:13:38 CET 2010
x86_64 GNU/Linux
It's a 2.6.33 kernel with theses two patch from Patrick McHardy (from
Netfilter):
http://patchwork.kernel.org/patch/76980/
http://patchwork.kernel.org/patch/76980/
host2-kvm:~# virsh version
Compiled against library: libvir 0.7.6
Using library: libvir 0.7.6
Using API: QEMU 0.7.6
Running hypervisor: QEMU 0.12.2
Under Debian/Lenny, with a 2.6.26 kernel, i don't encounter this bug?
Can someone tell me if there is any option to active in the kernel for
debug this?
Many thanks.
Regards.
[-- Attachment #2: jean-philippe_menil.vcf --]
[-- Type: text/x-vcard, Size: 446 bytes --]
begin:vcard
fn:Jean-Philippe Menil
n:Menil;Jean-Philippe
org;quoted-printable:Universit=C3=A9 de Nantes;IRTS - DSI
adr;quoted-printable:BP 92208 Cedex 3;;2, rue de la Houssini=C3=A8re;Nantes;Loire-Atlantique;44322;France
email;internet:jean-philippe.menil@univ-nantes.fr
title;quoted-printable:Administrateur R=C3=A9seau
tel;work:02.51.12.53.92
tel;fax:02.51.12.58.60
x-mozilla-html:FALSE
url:http://www.cri.univ-nantes.fr
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-02-09 20:03 ` Jean-Philippe Menil
@ 2010-02-09 20:25 ` Chris Wright
2010-02-09 20:46 ` Jean-Philippe Menil
0 siblings, 1 reply; 20+ messages in thread
From: Chris Wright @ 2010-02-09 20:25 UTC (permalink / raw)
To: Jean-Philippe Menil
Cc: Tom Lendacky, Chris Wright, Avi Kivity, Herbert Xu, rek2, kvm,
markmc
* Jean-Philippe Menil (jean-philippe.menil@univ-nantes.fr) wrote:
> it seems, that i encounter the same bug.
>
> I've a guest with an high network load, and after some time, it seems
> that there's "no more network".
> Under the guest, I can't ping anymore the gateway.
> If i restart the guest, everything work fine again.
How reproducible is this (and can you also recover the network by simply
rmmod/modprobe'ing the virtio_net driver)? Tom posted a patch that you
could try if you have an easy way to trigger.
http://marc.info/?l=kvm&m=126564542625725&w=2
thanks,
-chris
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-02-09 20:25 ` Chris Wright
@ 2010-02-09 20:46 ` Jean-Philippe Menil
2010-02-10 7:53 ` Jean-Philippe Menil
0 siblings, 1 reply; 20+ messages in thread
From: Jean-Philippe Menil @ 2010-02-09 20:46 UTC (permalink / raw)
To: Chris Wright; +Cc: Tom Lendacky, Avi Kivity, Herbert Xu, rek2, kvm, markmc
[-- Attachment #1: Type: text/plain, Size: 906 bytes --]
Chris Wright a écrit :
> * Jean-Philippe Menil (jean-philippe.menil@univ-nantes.fr) wrote:
>
>> it seems, that i encounter the same bug.
>>
>> I've a guest with an high network load, and after some time, it seems
>> that there's "no more network".
>> Under the guest, I can't ping anymore the gateway.
>> If i restart the guest, everything work fine again.
>>
>
> How reproducible is this (and can you also recover the network by simply
> rmmod/modprobe'ing the virtio_net driver)? Tom posted a patch that you
> could try if you have an easy way to trigger.
>
> http://marc.info/?l=kvm&m=126564542625725&w=2
>
> thanks,
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Thanks,
i've just seen the patch.
I will test it.
Many thanks.
[-- Attachment #2: jean-philippe_menil.vcf --]
[-- Type: text/x-vcard, Size: 446 bytes --]
begin:vcard
fn:Jean-Philippe Menil
n:Menil;Jean-Philippe
org;quoted-printable:Universit=C3=A9 de Nantes;IRTS - DSI
adr;quoted-printable:BP 92208 Cedex 3;;2, rue de la Houssini=C3=A8re;Nantes;Loire-Atlantique;44322;France
email;internet:jean-philippe.menil@univ-nantes.fr
title;quoted-printable:Administrateur R=C3=A9seau
tel;work:02.51.12.53.92
tel;fax:02.51.12.58.60
x-mozilla-html:FALSE
url:http://www.cri.univ-nantes.fr
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: network shutdown under heavy load
2010-02-09 20:46 ` Jean-Philippe Menil
@ 2010-02-10 7:53 ` Jean-Philippe Menil
0 siblings, 0 replies; 20+ messages in thread
From: Jean-Philippe Menil @ 2010-02-10 7:53 UTC (permalink / raw)
To: jean-philippe.menil
Cc: Chris Wright, Tom Lendacky, Avi Kivity, Herbert Xu, rek2, kvm,
markmc
[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]
Jean-Philippe Menil a écrit :
> Chris Wright a écrit :
>> * Jean-Philippe Menil (jean-philippe.menil@univ-nantes.fr) wrote:
>>
>>> it seems, that i encounter the same bug.
>>>
>>> I've a guest with an high network load, and after some time, it
>>> seems that there's "no more network".
>>> Under the guest, I can't ping anymore the gateway.
>>> If i restart the guest, everything work fine again.
>>>
>>
>> How reproducible is this (and can you also recover the network by simply
>> rmmod/modprobe'ing the virtio_net driver)? Tom posted a patch that you
>> could try if you have an easy way to trigger.
>>
>> http://marc.info/?l=kvm&m=126564542625725&w=2
>>
>> thanks,
>> -chris
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> Thanks,
>
> i've just seen the patch.
> I will test it.
>
> Many thanks.
Hi,
with the patch, the guest pass the night, instead of 2 hours usually.
Thanks a lot, it solve the problem.
Regards.
[-- Attachment #2: jean-philippe_menil.vcf --]
[-- Type: text/x-vcard, Size: 433 bytes --]
begin:vcard
fn:Jean-Philippe Menil
n:Menil;Jean-Philippe
org;quoted-printable:Universit=C3=A9 de Nantes;IRTS - DSI
adr;quoted-printable:;;2, rue de la Houssini=C3=A8re;Nantes;Loire-Atlantique;44332;France
email;internet:jean-philippe.menil@univ-nantes.fr
title;quoted-printable:Administrateur R=C3=A9seau
tel;work:02.51.12.53.92
tel;fax:02.51.12.58.60
x-mozilla-html:FALSE
url:http://www.criun.univ-nantes.fr/
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-02-10 7:47 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-14 15:49 network shutdown under heavy load rek2
2009-12-16 12:17 ` Avi Kivity
2009-12-16 13:19 ` Herbert Xu
2009-12-17 18:15 ` rek2
2009-12-18 1:27 ` Herbert Xu
2009-12-21 16:39 ` rek2
2010-01-07 17:02 ` rek2
2010-01-10 12:30 ` Avi Kivity
2010-01-10 12:35 ` Herbert Xu
2010-01-10 12:38 ` Avi Kivity
2010-01-13 19:13 ` Tom Lendacky
2010-01-13 21:52 ` Chris Wright
2010-01-19 21:29 ` Tom Lendacky
2010-01-19 23:57 ` Chris Wright
2010-01-20 15:48 ` Tom Lendacky
2010-01-26 21:59 ` Tom Lendacky
2010-02-09 20:03 ` Jean-Philippe Menil
2010-02-09 20:25 ` Chris Wright
2010-02-09 20:46 ` Jean-Philippe Menil
2010-02-10 7:53 ` Jean-Philippe Menil
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).