qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] How to lock-up your tap-based VM network
@ 2010-04-12 16:43 Jan Kiszka
  2010-04-12 20:07 ` Paul Brook
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2010-04-12 16:43 UTC (permalink / raw)
  To: qemu-devel

Hi,

we found an ugly issue of the (pseudo) flow-control mechanism in
tap-based networks:

In recent Linux kernels (>= 2.6.30), the tun driver does TX queue length
accounting and stops sending packets if any local receiver does not
return enough of them. This aims at throttling the TX side when the RX
side is temporarily not able to run (e.g. because of CPU
overcommitment). Before that, there was the risk of dropping packets in
this scenario. Unfortunately this approach is fragile and even
counterproductive in some scenarios.

It is fragile as accounting is done based on skb->truesize on sender
side while its purely packet counting on the receiver side.
net/tap-linux.c claimes:

> /* sndbuf should be set to a value lower than the tx queue
>  * capacity of any destination network interface.
>  * Ethernet NICs generally have txqueuelen=1000, so 1Mb is
>  * a good default, given a 1500 byte MTU.
>  */
> #define TAP_DEFAULT_SNDBUF 1024*1024

This works for maximum-sized packets, but fails for minimum-sized ones.

But things get worse: Consider a local bridge with two VMs attached via
taps, and maybe a third interface used to connect to the world. If one
VM decides to shutdown its interface, it will queue packets directed to
it or sent as multicast to the bridge - 500 by default until it overruns
and finally starts dropping. If most of those packets came from the
other VM, that one will ran out of resources before that point! Simple
test: ifdown on the one side, ping -b -s 1472 on the other, and you will
lock out the second VM. This has happened in the field, creating some
unhappy customer. I see the point in avoiding packet drops, but this can
only work as best effort and must not cause such deadlocks.

A major reason for this deadlock could likely be removed by shutting
down the tap (if peered) or dropping packets in user space (in case of
vlan) when a NIC is stopped or otherwise shut down. Currently most (if
not all) NIC models seem to signal both "queue full" and "RX disabled"
via !can_receive(). This should be changed, probably by returning a
reason for "can't receive" so that the network layer can decide what to do.

Opinions? Better suggestions?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-04-13 19:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-12 16:43 [Qemu-devel] How to lock-up your tap-based VM network Jan Kiszka
2010-04-12 20:07 ` Paul Brook
2010-04-12 21:49   ` Jamie Lokier
2010-04-12 23:20     ` Paul Brook
2010-04-13 12:30       ` Jan Kiszka
2010-04-13 13:02         ` Paul Brook
2010-04-13 12:22     ` Jan Kiszka
2010-04-13 12:19   ` Jan Kiszka
2010-04-13 13:03     ` Paul Brook
2010-04-13 13:15       ` Jan Kiszka
2010-04-13 18:48   ` Blue Swirl
2010-04-13 19:13     ` Blue Swirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).