public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* virtio_net hang
@ 2008-11-13 12:27 Emmanuel Lacour
  2008-11-13 13:04 ` Daniel P. Berrange
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-13 12:27 UTC (permalink / raw)
  To: kvm

Dear kvm users/developpers,

I have a problem here where the network interface of a guest hang
2 or 3 times a day. No more packets can be sent out or received, no
error in guest or host logs. I have to stop networking, remove module,
then modprobe again and start the network to get back connection.

My setup:
host: debian etch, kernel 2.6.26 amd64 (etch backports), kvm 73, using
libvirt
guest: debian sarge, kernel 2.6.26 686 (from etch backports)


I looked at changelogs for userspace kvm tools as well as kernel but didn't
found something relevant to this problem.


Any help would be welcome :)


the guest config:

<domain type='kvm' id='7'>
  <name>bar</name>
  <uuid>8055f2fc-df9d-0ec2-9707-283ca503eb95</uuid>
  <memory>4194304</memory>
  <currentMemory>4194304</currentMemory>
  <vcpu>2</vcpu>
  <os>
    <type>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='block' device='disk'>
      <source dev='/dev/vg_foo/vm_bar'/>
      <target dev='hda' bus='ide'/>
    </disk>
    <disk type='file' device='cdrom'>
      <source file='/var/lib/kvm/isos/debian-31r0-i386-netinst.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
    </disk>
    <interface type='bridge'>
      <mac address='00:16:3e:02:00:15'/>
      <source bridge='br0'/>
      <target dev='tap5'/>
      <model type='virtio'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5905' listen='127.0.0.1' keymap='fr'/>
  </devices>
</domain>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-13 12:27 virtio_net hang Emmanuel Lacour
@ 2008-11-13 13:04 ` Daniel P. Berrange
  2008-11-13 13:15   ` Emmanuel Lacour
  2008-11-13 15:12 ` Mark McLoughlin
  2008-11-13 18:27 ` Fabio Coatti
  2 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrange @ 2008-11-13 13:04 UTC (permalink / raw)
  To: Emmanuel Lacour; +Cc: kvm

On Thu, Nov 13, 2008 at 01:27:09PM +0100, Emmanuel Lacour wrote:
> Dear kvm users/developpers,
> 
> I have a problem here where the network interface of a guest hang
> 2 or 3 times a day. No more packets can be sent out or received, no
> error in guest or host logs. I have to stop networking, remove module,
> then modprobe again and start the network to get back connection.
> 
> My setup:
> host: debian etch, kernel 2.6.26 amd64 (etch backports), kvm 73, using
> libvirt
> guest: debian sarge, kernel 2.6.26 686 (from etch backports)
> 
> 
> I looked at changelogs for userspace kvm tools as well as kernel but didn't
> found something relevant to this problem.
> 
> 
> Any help would be welcome :)
> 
> 
> the guest config:

Many of the KVM developers don't use libvirt, so probably best if you
post the actual KVM command line libvirt spawned - you can get it from
the logfile in /var/log/libvirt/qemu/$NAME.log, where $NAME is your
guest's name.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-13 13:04 ` Daniel P. Berrange
@ 2008-11-13 13:15   ` Emmanuel Lacour
  0 siblings, 0 replies; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-13 13:15 UTC (permalink / raw)
  To: kvm

On Thu, Nov 13, 2008 at 01:04:05PM +0000, Daniel P. Berrange wrote:
> 
> Many of the KVM developers don't use libvirt, so probably best if you
> post the actual KVM command line libvirt spawned - you can get it from
> the logfile in /var/log/libvirt/qemu/$NAME.log, where $NAME is your
> guest's name.
> 

You're right, here it is:

/usr/bin/kvm -S \
    -M pc \
    -m 4096 \
    -smp 2 \
    -name bar \
    -monitor pty \
    -boot c \
    -drive \
    file=/dev/vg_foo/vm_bar,if=ide,index=0,boot=on \
    -drive file=/var/lib/kvm/isos/debian-31r0-i386-netinst.iso,if=ide,media=cdrom,index=2 \
    -net nic,macaddr=00:16:3e:02:00:15,vlan=0,model=virtio \
    -net tap,fd=29,script=,vlan=0,ifname=tap5 \
    -serial none \
    -parallel none \
    -usb \
    -vnc 127.0.0.1:5 \
    -k fr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-13 12:27 virtio_net hang Emmanuel Lacour
  2008-11-13 13:04 ` Daniel P. Berrange
@ 2008-11-13 15:12 ` Mark McLoughlin
  2008-11-13 15:24   ` Emmanuel Lacour
  2008-11-13 18:27 ` Fabio Coatti
  2 siblings, 1 reply; 17+ messages in thread
From: Mark McLoughlin @ 2008-11-13 15:12 UTC (permalink / raw)
  To: Emmanuel Lacour; +Cc: kvm

On Thu, 2008-11-13 at 13:27 +0100, Emmanuel Lacour wrote:
> Dear kvm users/developpers,
> 
> I have a problem here where the network interface of a guest hang
> 2 or 3 times a day. No more packets can be sent out or received, no
> error in guest or host logs. I have to stop networking, remove module,
> then modprobe again and start the network to get back connection.

The fact that re-loading the virtio_net driver fixes things up makes me
suspect you've found a bug in the virtio_net driver, rather than e.g. a
bug in the kvm-userspace side.

To try and narrow down what's happening, when the interface has hung,
try:

  - tcpdump on both eth0 in the guest and the tap device on the host 
    (tap5 in your example)

  - look for anything unusual in the stats for both those interfaces,
     e.g. /proc/net/dev, netstat -s etc.

  - strace the /usr/bin/kvm process

What you're looking for e.g. is whether a guest->host ping is failing
because the request packet isn't getting to the host or because the
reply packet isn't getting to the guest, and where exactly the packet is
being blocked.

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-13 15:12 ` Mark McLoughlin
@ 2008-11-13 15:24   ` Emmanuel Lacour
  2008-11-14  9:23     ` Emmanuel Lacour
  0 siblings, 1 reply; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-13 15:24 UTC (permalink / raw)
  To: kvm

On Thu, Nov 13, 2008 at 03:12:33PM +0000, Mark McLoughlin wrote:
> The fact that re-loading the virtio_net driver fixes things up makes me
> suspect you've found a bug in the virtio_net driver, rather than e.g. a
> bug in the kvm-userspace side.
> 
> To try and narrow down what's happening, when the interface has hung,
> try:
> 
>   - tcpdump on both eth0 in the guest and the tap device on the host 
>     (tap5 in your example)
> 
>   - look for anything unusual in the stats for both those interfaces,
>      e.g. /proc/net/dev, netstat -s etc.
> 
>   - strace the /usr/bin/kvm process
> 
> What you're looking for e.g. is whether a guest->host ping is failing
> because the request packet isn't getting to the host or because the
> reply packet isn't getting to the guest, and where exactly the packet is
> being blocked.
> 

Nice hints, thanks, I will try to debug that deeper and come back with
more informations :)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-13 12:27 virtio_net hang Emmanuel Lacour
  2008-11-13 13:04 ` Daniel P. Berrange
  2008-11-13 15:12 ` Mark McLoughlin
@ 2008-11-13 18:27 ` Fabio Coatti
  2 siblings, 0 replies; 17+ messages in thread
From: Fabio Coatti @ 2008-11-13 18:27 UTC (permalink / raw)
  To: Emmanuel Lacour; +Cc: kvm

2008/11/13 Emmanuel Lacour <elacour@easter-eggs.com>:
> Dear kvm users/developpers,
>
> I have a problem here where the network interface of a guest hang
> 2 or 3 times a day. No more packets can be sent out or received, no
> error in guest or host logs. I have to stop networking, remove module,
> then modprobe again and start the network to get back connection.
>
> My setup:
> host: debian etch, kernel 2.6.26 amd64 (etch backports), kvm 73, using
> libvirt
> guest: debian sarge, kernel 2.6.26 686 (from etch backports)
>
>
> I looked at changelogs for userspace kvm tools as well as kernel but didn't
> found something relevant to this problem.
>
>
> Any help would be welcome :)
>
2008/11/13 Mark McLoughlin <markmc@redhat.com>:

>
> To try and narrow down what's happening, when the interface has hung,
> try:
>
>  - tcpdump on both eth0 in the guest and the tap device on the host
>    (tap5 in your example)
>
>  - look for anything unusual in the stats for both those interfaces,
>     e.g. /proc/net/dev, netstat -s etc.
>
>  - strace the /usr/bin/kvm process
>
> What you're looking for e.g. is whether a guest->host ping is failing
> because the request packet isn't getting to the host or because the
> reply packet isn't getting to the guest, and where exactly the packet is
> being blocked.

We are seeing something similar under kvm-77,76,78, with recent
kernels (2.6.27.X) as well as some 26.X
Basically after some time with high network load the interface stops working.
what we are seeing sniffing at tap level is some arp packets going
out, but no answer comes from network. Basically, it seems that the
machine gets disconnected. Anyway I doubt that the host/external
networks have something to do with this, as a reboot always makes the
network happy again.
I can't tell how much traffic flows trough the inteface prior the
problem, but it seems indeed related to the amount of bytes.
If someone can give me some directions on how to dig this I'll be
grateful. Of course, I'm using virtio drivers.
We are unable to reproduce this with full emulation devices.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-13 15:24   ` Emmanuel Lacour
@ 2008-11-14  9:23     ` Emmanuel Lacour
  2008-11-14 18:26       ` Mark McLoughlin
  0 siblings, 1 reply; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-14  9:23 UTC (permalink / raw)
  To: kvm

On Thu, Nov 13, 2008 at 04:24:52PM +0100, Emmanuel Lacour wrote:
> On Thu, Nov 13, 2008 at 03:12:33PM +0000, Mark McLoughlin wrote:
> > The fact that re-loading the virtio_net driver fixes things up makes me
> > suspect you've found a bug in the virtio_net driver, rather than e.g. a
> > bug in the kvm-userspace side.
> > 
> > To try and narrow down what's happening, when the interface has hung,
> > try:
> > 
> >   - tcpdump on both eth0 in the guest and the tap device on the host 
> >     (tap5 in your example)
> > 


On eth0 I see echo requests, but _no_ echo replies
On tap5 I see echo requests _and_ echo replies

> >   - look for anything unusual in the stats for both those interfaces,
> >      e.g. /proc/net/dev, netstat -s etc.
> > 

Comparing with other guest without problems, the only difference is that this
tap (and only this one) reports "overruns":

tap5      Link encap:Ethernet  HWaddr 00:FF:AD:53:76:25  
          inet6 addr: fe80::2ff:adff:fe53:7625/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:717737621 errors:0 dropped:0 overruns:0 frame:0
          TX packets:636626720 errors:0 dropped:0 overruns:317 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:368973099756 (343.6 GiB)  TX bytes:217917073227 (202.9 GiB)

overruns seems to happen just when there is "hang", it doesn't seems to
increase when network is working properly.


> >   - strace the /usr/bin/kvm process
> > 

Unfortunatly I was unable to do this because I can't reproduce the problem on a
test VM and I can't leave this VM with a non working network for analysis
because of production so I have a script which pings and restart
module/interface when needed.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-14  9:23     ` Emmanuel Lacour
@ 2008-11-14 18:26       ` Mark McLoughlin
  2008-11-18 18:37         ` Emmanuel Lacour
  0 siblings, 1 reply; 17+ messages in thread
From: Mark McLoughlin @ 2008-11-14 18:26 UTC (permalink / raw)
  To: Emmanuel Lacour; +Cc: kvm

On Fri, 2008-11-14 at 10:23 +0100, Emmanuel Lacour wrote:
> On Thu, Nov 13, 2008 at 04:24:52PM +0100, Emmanuel Lacour wrote:
> > On Thu, Nov 13, 2008 at 03:12:33PM +0000, Mark McLoughlin wrote:
> > > The fact that re-loading the virtio_net driver fixes things up makes me
> > > suspect you've found a bug in the virtio_net driver, rather than e.g. a
> > > bug in the kvm-userspace side.
> > > 
> > > To try and narrow down what's happening, when the interface has hung,
> > > try:
> > > 
> > >   - tcpdump on both eth0 in the guest and the tap device on the host 
> > >     (tap5 in your example)
> > > 
> 
> 
> On eth0 I see echo requests, but _no_ echo replies
> On tap5 I see echo requests _and_ echo replies

Okay, so the guest isn't receiving packets.

> > >   - look for anything unusual in the stats for both those interfaces,
> > >      e.g. /proc/net/dev, netstat -s etc.
> > > 
> 
> Comparing with other guest without problems, the only difference is that this
> tap (and only this one) reports "overruns":
> 
> tap5      Link encap:Ethernet  HWaddr 00:FF:AD:53:76:25  
>           inet6 addr: fe80::2ff:adff:fe53:7625/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:717737621 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:636626720 errors:0 dropped:0 overruns:317 carrier:0
>           collisions:0 txqueuelen:500 
>           RX bytes:368973099756 (343.6 GiB)  TX bytes:217917073227 (202.9 GiB)
> 
> overruns seems to happen just when there is "hang", it doesn't seems to
> increase when network is working properly.

Right, the tap device tx queue is full because kvm-userspace isn't
reading packets from it.

This could be because kvm-userspace has just stopped noticing that
there's data available from the tapfd or because virtio_net in the guest
has stopped noticing that packets are available in the ring.

One thing you could easily check is whether:

  ip link set eth0 down
  ip link set eth0 up

in the host is sufficient to fix it? If it is, then it points to a guest
driver issue.

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-14 18:26       ` Mark McLoughlin
@ 2008-11-18 18:37         ` Emmanuel Lacour
  2008-11-18 18:48           ` Emmanuel Lacour
  2008-11-19 13:13           ` Mark McLoughlin
  0 siblings, 2 replies; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-18 18:37 UTC (permalink / raw)
  To: kvm

On Fri, Nov 14, 2008 at 06:26:44PM +0000, Mark McLoughlin wrote:
> 
> Right, the tap device tx queue is full because kvm-userspace isn't
> reading packets from it.
> 
> This could be because kvm-userspace has just stopped noticing that
> there's data available from the tapfd or because virtio_net in the guest
> has stopped noticing that packets are available in the ring.
> 
> One thing you could easily check is whether:
> 
>   ip link set eth0 down
>   ip link set eth0 up
> 
> in the host is sufficient to fix it? If it is, then it points to a guest
> driver issue.
> 

I made the test, putting link down then up fix it.

So what can I do next time to help fixing this ?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-18 18:37         ` Emmanuel Lacour
@ 2008-11-18 18:48           ` Emmanuel Lacour
  2008-11-19 13:13           ` Mark McLoughlin
  1 sibling, 0 replies; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-18 18:48 UTC (permalink / raw)
  To: kvm

On Tue, Nov 18, 2008 at 07:37:57PM +0100, Emmanuel Lacour wrote:
> 
> I made the test, putting link down then up fix it.
> 
> So what can I do next time to help fixing this ?
> 

I had the problem one more time, I made an strace of the kvm process
which start with non working network, I then did a ping, then a link
down, link up, ping (which works again) and stopped the strace.

Who is interested by the strace to help me analyze it ? ;)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-18 18:37         ` Emmanuel Lacour
  2008-11-18 18:48           ` Emmanuel Lacour
@ 2008-11-19 13:13           ` Mark McLoughlin
  2008-11-19 19:03             ` Mark McLoughlin
  2008-11-20 11:34             ` Emmanuel Lacour
  1 sibling, 2 replies; 17+ messages in thread
From: Mark McLoughlin @ 2008-11-19 13:13 UTC (permalink / raw)
  To: Emmanuel Lacour; +Cc: kvm

On Tue, 2008-11-18 at 19:37 +0100, Emmanuel Lacour wrote:
> On Fri, Nov 14, 2008 at 06:26:44PM +0000, Mark McLoughlin wrote:
> > 
> > Right, the tap device tx queue is full because kvm-userspace isn't
> > reading packets from it.
> > 
> > This could be because kvm-userspace has just stopped noticing that
> > there's data available from the tapfd or because virtio_net in the guest
> > has stopped noticing that packets are available in the ring.
> > 
> > One thing you could easily check is whether:
> > 
> >   ip link set eth0 down
> >   ip link set eth0 up
> > 
> > in the host is sufficient to fix it? If it is, then it points to a guest
> > driver issue.
> > 
> 
> I made the test, putting link down then up fix it.

Thanks, that's very interesting.

Since bringing the interface up and down basically just causes the
driver to re-schedule itself with NAPI, all I can see as a possibility
is that we somehow (e.g. a race condition) had gotten ourselves into a
state where we have rx ring interrupts disabled and we're not scheduled
with NAPI.

We synchronise around the NAPI_STATE_SCHED bit with atomic operations
and all the logic looks correct ... so I'm stumped, really.

Is it possible for you to try a newer guest kernel?

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-19 13:13           ` Mark McLoughlin
@ 2008-11-19 19:03             ` Mark McLoughlin
  2008-11-20 11:36               ` Emmanuel Lacour
  2008-11-20 11:34             ` Emmanuel Lacour
  1 sibling, 1 reply; 17+ messages in thread
From: Mark McLoughlin @ 2008-11-19 19:03 UTC (permalink / raw)
  To: Emmanuel Lacour; +Cc: kvm

On Wed, 2008-11-19 at 13:13 +0000, Mark McLoughlin wrote:
> On Tue, 2008-11-18 at 19:37 +0100, Emmanuel Lacour wrote:
> > On Fri, Nov 14, 2008 at 06:26:44PM +0000, Mark McLoughlin wrote:
> > > 
> > > Right, the tap device tx queue is full because kvm-userspace isn't
> > > reading packets from it.
> > > 
> > > This could be because kvm-userspace has just stopped noticing that
> > > there's data available from the tapfd or because virtio_net in the guest
> > > has stopped noticing that packets are available in the ring.
> > > 
> > > One thing you could easily check is whether:
> > > 
> > >   ip link set eth0 down
> > >   ip link set eth0 up
> > > 
> > > in the host is sufficient to fix it? If it is, then it points to a guest
> > > driver issue.
> > > 
> > 
> > I made the test, putting link down then up fix it.
> 
> Thanks, that's very interesting.
> 
> Since bringing the interface up and down basically just causes the
> driver to re-schedule itself with NAPI, all I can see as a possibility
> is that we somehow (e.g. a race condition) had gotten ourselves into a
> state where we have rx ring interrupts disabled and we're not scheduled
> with NAPI.
> 
> We synchronise around the NAPI_STATE_SCHED bit with atomic operations
> and all the logic looks correct ... so I'm stumped, really.

I had a look at Emmanuel's strace log and it shows that qemu isn't
selecting on the tapfd, presumably because virtio_net_can_receive() sees
that we've exhausted all available receive buffers.

When qemu does poll the tapfd (after an ifdown/ifup in the guest), there
are a load of packets waiting in the queue and things proceed as normal.

That still jives with the theory that we're somehow getting into a state
where NAPI polling is de-scheduled while guest rx interrupts are also
disabled.

> Is it possible for you to try a newer guest kernel?

If you can try a newer kernel, or even try some debugging patches, that
would help a lot.

Cheers,
Mark.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-19 13:13           ` Mark McLoughlin
  2008-11-19 19:03             ` Mark McLoughlin
@ 2008-11-20 11:34             ` Emmanuel Lacour
  1 sibling, 0 replies; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-20 11:34 UTC (permalink / raw)
  To: kvm

On Wed, Nov 19, 2008 at 01:13:52PM +0000, Mark McLoughlin wrote:
> 
> Is it possible for you to try a newer guest kernel?
> 

The guest will be rebooted today on 2.7.27.6.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-19 19:03             ` Mark McLoughlin
@ 2008-11-20 11:36               ` Emmanuel Lacour
  2008-11-21  8:38                 ` Emmanuel Lacour
  2008-11-21 14:44                 ` Guido Günther
  0 siblings, 2 replies; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-20 11:36 UTC (permalink / raw)
  To: kvm

On Wed, Nov 19, 2008 at 07:03:09PM +0000, Mark McLoughlin wrote:
> 
> I had a look at Emmanuel's strace log and it shows that qemu isn't
> selecting on the tapfd, presumably because virtio_net_can_receive() sees
> that we've exhausted all available receive buffers.
> 
> When qemu does poll the tapfd (after an ifdown/ifup in the guest), there
> are a load of packets waiting in the queue and things proceed as normal.
> 
> That still jives with the theory that we're somehow getting into a state
> where NAPI polling is de-scheduled while guest rx interrupts are also
> disabled.
> 
> > Is it possible for you to try a newer guest kernel?
> 
> If you can try a newer kernel, or even try some debugging patches, that
> would help a lot.
> 

The difficulty is that I can not always reproduce the bug.

But another interesting think is that I switched to e1000 and I had
another lock after that with same symptoms :(

Like answered a few minutes ago, I will try a 2.6.27.6 in the guest
today and let you know on the first problem I encounter if any ;)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-20 11:36               ` Emmanuel Lacour
@ 2008-11-21  8:38                 ` Emmanuel Lacour
  2008-11-22 14:20                   ` Emmanuel Lacour
  2008-11-21 14:44                 ` Guido Günther
  1 sibling, 1 reply; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-21  8:38 UTC (permalink / raw)
  To: kvm

On Thu, Nov 20, 2008 at 12:36:50PM +0100, Emmanuel Lacour wrote:
> The difficulty is that I can not always reproduce the bug.
> 
> But another interesting think is that I switched to e1000 and I had
> another lock after that with same symptoms :(
> 
> Like answered a few minutes ago, I will try a 2.6.27.6 in the guest
> today and let you know on the first problem I encounter if any ;)
> 

I continue to have this problem with this setup:

- host 2.6.27.4, kvm-78, intel, debian etch 64bits
- guest 2.6.27.6, debian sarge 32 bits, e1000, 2 vcpus

up/down of interface is enough to recover.

there is 6 other guests on this host (mix of Debian
woody/sarge/etch/lenny, 2.6, 2.4) this is the only one with this
problem, but this is also I think the one with most IP traffic
(http/ftp + rsync backups).


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-20 11:36               ` Emmanuel Lacour
  2008-11-21  8:38                 ` Emmanuel Lacour
@ 2008-11-21 14:44                 ` Guido Günther
  1 sibling, 0 replies; 17+ messages in thread
From: Guido Günther @ 2008-11-21 14:44 UTC (permalink / raw)
  To: Emmanuel Lacour; +Cc: kvm

On Thu, Nov 20, 2008 at 12:36:50PM +0100, Emmanuel Lacour wrote:
> On Wed, Nov 19, 2008 at 07:03:09PM +0000, Mark McLoughlin wrote:
> > 
> > I had a look at Emmanuel's strace log and it shows that qemu isn't
> > selecting on the tapfd, presumably because virtio_net_can_receive() sees
> > that we've exhausted all available receive buffers.
> > 
> > When qemu does poll the tapfd (after an ifdown/ifup in the guest), there
> > are a load of packets waiting in the queue and things proceed as normal.
> > 
> > That still jives with the theory that we're somehow getting into a state
> > where NAPI polling is de-scheduled while guest rx interrupts are also
> > disabled.
> > 
> > > Is it possible for you to try a newer guest kernel?
> > 
> > If you can try a newer kernel, or even try some debugging patches, that
> > would help a lot.
> > 
> 
> The difficulty is that I can not always reproduce the bug.
> 
> But another interesting think is that I switched to e1000 and I had
> another lock after that with same symptoms :(
Same symptoms with the rtl8139 and kvm-userspace 78 running on 2.6.24
(kvm modules from kvm 78), I wasn't able to reproduce this with kvm 63
(nor did I find the time to do any further debugging yet).
Cheers,
 -- Guido

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: virtio_net hang
  2008-11-21  8:38                 ` Emmanuel Lacour
@ 2008-11-22 14:20                   ` Emmanuel Lacour
  0 siblings, 0 replies; 17+ messages in thread
From: Emmanuel Lacour @ 2008-11-22 14:20 UTC (permalink / raw)
  To: kvm

On Fri, Nov 21, 2008 at 09:38:23AM +0100, Emmanuel Lacour wrote:
> 
> I continue to have this problem with this setup:
> 
> - host 2.6.27.4, kvm-78, intel, debian etch 64bits
> - guest 2.6.27.6, debian sarge 32 bits, e1000, 2 vcpus
> 
> up/down of interface is enough to recover.
> 


Today I had a trace on this problem (first time), maybe this can help:

Nov 21 14:34:42 foo kernel: [58345.000104] ------------[ cut here ]------------
Nov 21 14:34:42 foo kernel: [58345.000617] WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x164/0x1fd()
Nov 21 14:34:42 foo kernel: [58345.001374] NETDEV WATCHDOG: eth0 (e1000): transmit timed out
Nov 21 14:34:42 foo kernel: [58345.001825] Modules linked in: nfsd auth_rpcgss exportfs ac battery ipv6 nfs lockd nfs_acl sunrpc nf_nat_ftp nf_conntrack_ftp xt_tcpudp iptable_mangle iptable_nat nf_nat ipt_REJECT xt_limit nf_conntrack_ipv4 xt_state nf_conntrack ipt_LOG ipt_ULOG iptable_filter ip_tables x_tables dm_mod sd_mod snd_pcsp virtio_balloon snd_pcm snd_timer snd soundcore floppy snd_page_alloc psmouse serio_raw virtio_pci i2c_piix4 button i2c_core evdev ext3 jbd mbcache ide_cd_mod cdrom ide_disk ata_generic ata_piix libata scsi_mod dock piix ide_core uhci_hcd e1000 usbcore thermal processor fan thermal_sys
Nov 21 14:34:42 foo kernel: [58345.022597] Pid: 0, comm: swapper Not tainted 2.6.27.6 #1
Nov 21 14:34:42 foo kernel: [58345.023027]  [<c0126295>] warn_slowpath+0x58/0x70
Nov 21 14:34:42 foo kernel: [58345.023470]  [<c028e96b>] ip_output+0x8e/0x90
Nov 21 14:34:42 foo kernel: [58345.029292]  [<c028e3ab>] ip_local_out+0x15/0x17
Nov 21 14:34:42 foo kernel: [58345.030231]  [<c028ec1d>] ip_queue_xmit+0x2b0/0x2f7
Nov 21 14:34:42 foo kernel: [58345.030684]  [<c01169c7>] pvclock_get_nsec_offset+0xb/0x59
Nov 21 14:34:42 foo kernel: [58345.031163]  [<c0116a79>] pvclock_clocksource_read+0x1a/0x2d
Nov 21 14:34:42 foo kernel: [58345.031642]  [<c01169c7>] pvclock_get_nsec_offset+0xb/0x59
Nov 21 14:34:42 foo kernel: [58345.032130]  [<c0116a79>] pvclock_clocksource_read+0x1a/0x2d
Nov 21 14:34:42 foo kernel: [58345.032608]  [<c01169c7>] pvclock_get_nsec_offset+0xb/0x59
Nov 21 14:34:42 foo kernel: [58345.033080]  [<c01169c7>] pvclock_get_nsec_offset+0xb/0x59
Nov 21 14:34:42 foo kernel: [58345.033570]  [<c01169c7>] pvclock_get_nsec_offset+0xb/0x59
Nov 21 14:34:42 foo kernel: [58345.034036]  [<c0116a79>] pvclock_clocksource_read+0x1a/0x2d
Nov 21 14:34:42 foo kernel: [58345.034506]  [<c027e572>] dev_watchdog+0x164/0x1fd
Nov 21 14:34:42 foo kernel: [58345.034951]  [<c0139cf7>] __atomic_notifier_call_chain+0x10/0x13
Nov 21 14:34:42 foo kernel: [58345.035455]  [<c02d4c73>] _spin_lock_bh+0xf/0x12
Nov 21 14:34:42 foo kernel: [58345.035898]  [<c012d9a4>] timer_stats_account_timer+0x22/0x27
Nov 21 14:34:42 foo kernel: [58345.036384]  [<c012dfcf>] run_timer_softirq+0x11f/0x183
Nov 21 14:34:42 foo kernel: [58345.036840]  [<c027e40e>] dev_watchdog+0x0/0x1fd
Nov 21 14:34:42 foo kernel: [58345.037278]  [<c01390dc>] hrtimer_interrupt+0x136/0x15e
Nov 21 14:34:42 foo kernel: [58345.037736]  [<c012a53f>] __do_softirq+0x69/0xd3
Nov 21 14:34:42 foo kernel: [58345.038172]  [<c012a5ed>] do_softirq+0x44/0x52
Nov 21 14:34:42 foo kernel: [58345.038608]  [<c012a67e>] irq_exit+0x38/0x6c
Nov 21 14:34:42 foo kernel: [58345.039035]  [<c01106f1>] smp_apic_timer_interrupt+0x2a/0x33
Nov 21 14:34:42 foo kernel: [58345.039518]  [<c0104434>] apic_timer_interrupt+0x28/0x30
Nov 21 14:34:42 foo kernel: [58345.039980]  [<c0116560>] native_safe_halt+0x2/0x3
Nov 21 14:34:42 foo kernel: [58345.040443]  [<c0109196>] default_idle+0x2e/0x54
Nov 21 14:34:42 foo kernel: [58345.040886]  [<c01020ff>] cpu_idle+0xc4/0xf7
Nov 21 14:34:42 foo kernel: [58345.041310]  =======================
Nov 21 14:34:42 foo kernel: [58345.041692] ---[ end trace 934a9cb836d2434b ]---
Nov 21 14:34:42 foo kernel: [58345.080575] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-11-22 14:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-13 12:27 virtio_net hang Emmanuel Lacour
2008-11-13 13:04 ` Daniel P. Berrange
2008-11-13 13:15   ` Emmanuel Lacour
2008-11-13 15:12 ` Mark McLoughlin
2008-11-13 15:24   ` Emmanuel Lacour
2008-11-14  9:23     ` Emmanuel Lacour
2008-11-14 18:26       ` Mark McLoughlin
2008-11-18 18:37         ` Emmanuel Lacour
2008-11-18 18:48           ` Emmanuel Lacour
2008-11-19 13:13           ` Mark McLoughlin
2008-11-19 19:03             ` Mark McLoughlin
2008-11-20 11:36               ` Emmanuel Lacour
2008-11-21  8:38                 ` Emmanuel Lacour
2008-11-22 14:20                   ` Emmanuel Lacour
2008-11-21 14:44                 ` Guido Günther
2008-11-20 11:34             ` Emmanuel Lacour
2008-11-13 18:27 ` Fabio Coatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox